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Progress  and  Challenges  in  CFD  Methods 

and  Algorithms 

(AGARD  CP-578) 


Executive  Summary 


Computational  Fluid  Dynamics  (CFD)  now  plays  an  essential  role  in  the  design  of  aerospace  vehicles. 
The  ability  of  numerical  methods  to  accurately  simulate  complex  external  and  internal  aerodynamic 
flows  is  crucial  to  the  success  of  these  methods  in  the  design  process,  and  for  airplanes  leads  to 
improved  performance,  agility  and  maneuverability. 

In  the  last  decade,  considerable  progress  has  been  made  in  the  development  of  numerical  methods 
related  to  CFD.  As  a  result,  various  promising  CFD  schemes  and  algorithms  have  been  developed. 
However,  they  are  not  currently  used  in  industrial  codes.  At  the  same  time,  new  developments  in 
computer  hardware  and  architectures  have  led  to  significant  advances  in  parallel  computing  and 
multiprocessing.  These  topics,  which  are  considered  likely  to  constitute  pacing  items  and  new 
challenges  in  CFD  in  the  near  future,  formed  the  framework  for  the  program  for  this  Symposium. 

The  following  subjects  were  addressed:  parallel  computing,  advanced  spatial  discretization  techniques, 
unstructured,  hybrid  and  overlapping  grids,  adaptive  meshes,  fast  implicit  and  iterative  solvers,  large 
eddy  and  direct  numerical  simulations  of  turbulent  flows,  chemically  reacting  flows  and  unsteady 
aerodynamics.  Interesting  and  new  aspects  of  techniques  involving  these  subjects  were  discussed, 
substantiating  their  extended  potential  and  improved  capabilities.  Several  important  directions  of 
research  such  as  aerodynamic  shape  optimization  and  multidisciplinary  analysis  and  design  were 
identified,  which  should  be  the  subject  of  intensive  advanced  research  in  the  near  future. 

The  Symposium  provided  a  very  valuable  opportunity  for  exchange  of  information  about  recent 
developments  and  achievements.  It  can,  therefore,  be  expected  to  significantly  contribute  to  future 
important  progress  in  the  advancement  of  numerical  techniques  used  in  the  design  of  aerospace  vehicles 
and  other  flying  objects. 

Jean-Andre  Essers 
Programme  Committee  Chairman 
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Progres  realises  et  defis  en  methodes 
et  algorithmes  CFD 

(AGARD  CP-578) 


Synthese 

L’aerodynamique  numerique  (CFD)  joue  desormais  un  role  essentiel  dans  la  conception  des  vehicules 
aerospatiaux.  La  capacite  des  methodes  numeriques  a  simuler  avec  precision  des  ecoulements 
aerodynamiques  complexes  internes  et  extemes  est  essentielle  pour  la  reussite  de  ces  methodes  dans  le 
processus  de  conception  et  pour  les  aeronefs,  elle  permet  d’ameliorer  les  performances,  I’agilite  et  la 
manoeuvrabilite  des  appareils. 

Au  cours  de  la  demiere  decennie,  des  progres  considerables  ont  ete  realises  dans  le  developpement  de 
methodes  numeriques  se  rapportant  au  CFD.  De  ce  fait,  divers  algorithmes  et  diverses  methodes  CFD 
prometteurs  ont  ete  developpes.  Cependant,  ils  n’ont  pas  ete  integres  aux  codes  industriels.  En  meme 
temps,  les  nouveaux  developpements  en  materiel  et  architectures  informatiques  ont  permis  des 
avancees  appreciables  dans  le  domaine  du  calcul  en  parallele  et  du  multitraitement.  Ces  sujets,  qui  sont 
consideres  comme  susceptibles  de  constituer  les  jalons  et  les  nouveaux  challenges  du  CFD  dans  un 
avenir  proche,  ont  constitue  I’ossature  du  programme  de  ce  symposium. 

Les  sujets  suivants  ont  ete  examines:  le  calcul  en  parallMe,  les  techniques  de  discretisation  spatiale 
avancees,  les  maillages  non-structures,  hybrides  et  imbriques,  les  maillages  adaptatifs,  les  codes  de 
resolution  rapides,  implicites  et  iteratifs,  la  simulation  des  grands  tourbillons  et  la  simulation  numerique 
directe  d’ecoulements  turbulents,  les  ecoulements  a  reaction  chimique  et  I’aerodynamique  non 
permanente. 

Des  discussions  pertinentes  ont  eu  lieu  sur  des  aspects  nouveaux  et  intdressants  de  techniques  se 
rapportant  a  ces  sujets,  confirmant  ainsi  I’extension  de  leur  potentiel  et  I’amelioration  de  leurs 
capacites.  Plusieurs  orientations  importantes  pour  la  recherche,  telles  que  roptimisation  du  profil 
aerodynamique  et  1’ analyse  et  la  conception  multidisciplinaires  ont  ete  identifiees  comme  devant  faire 
I’objet  de  travaux  de  recherche  avances  intensifs  dans  un  avenir  proche. 

Le  symposium  a  foumi  1’ occasion  inestimable  pour  echanger  des  informations  sur  les  realisations  et  les 
developpements  recents.  II  devrait,  par  consequent,  representer  une  contribution  non  negligeable  aux 
futurs  progres  importants  dans  I’avancement  des  techniques  numeriques  pour  la  conception  des 
vehicules  aerospatiaux  et  d’autres  objets  volants. 

Jean- Andre  Essers 
Programme  Committee  Chairman 
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Technical  Evaluation  Report 
AGARD  Fluid  Dynamics  Panel  Symposium  on 
’’Progress  and  Challenges  in  CFD  Methods  and  Algorithms“ 

N.  Kroll 

Institute  of  Design  Aerodynamics 
DLR,  38108  Braunschweig 
Lilienthalplatz  7,  Germany 


SUMMARY 

The  Fluid  Dynamics  Panel  of  AGARD  conducted  a 
Symposium  on  ’’Progress  and  Challenges  in  CFD  Meth¬ 
ods  and  Algorithms"  in  Seville,  Spain,  on  October  2-5, 
1995.  The  purpose  of  this  symposium  was  to  identify 
and  discuss  topics  which  are  likely  to  constitute  pacing 
items  and  challenges  in  Computational  Fluid  Dynamics. 
Sessions  were  devoted  specifically  to  parallel  comput¬ 
ing,  advanced  discretization  schemes  and  advanced  grid 
structures.  Topics  also  include  adaptive  meshes,  fast  it¬ 
erative  methods  and  algorithmic  aspects  for  the  compu¬ 
tation  of  reacting  flows  and  unsteady  flows.  In  this  eval¬ 
uation  report  an  attempt  is  made  to  point  out  the  critical 
issues  for  each  particular  subject  and  to  assess  how  far 
they  were  addressed  by  the  conference  papers.  Some 
general  concluding  remarks  and  recommendations  are 
given. 

1.  INTRODUCTION 

The  77th  Meeting  of  the  AGARD  Fluid  Dynamics  Panel 
was  held  from  the  2nd  to  the  5th  of  October,  1995,  in 
Seville,  Spain.  The  symposium  was  focused  on 
’’Progress  and  Challenges  in  CFD  Methods  and  Algo¬ 
rithms".  The  background  and  need  for  such  a  meeting 
was  stated  in  the  call  for  papers: 

’’The  design  of  aerospace  vehicles  strongly  de¬ 
pends  on  the  ability  of  numerical  methods  to 
simulate  complex  flow  fields.  In  the  last  de¬ 
cade,  considerable  progress  has  been  made  in 
the  development  of  numerical  methods  related 
to  CFD.  As  a  result,  various  promising  CFD 
schemes  and  algorithms  have  been  developed 
which  are  not  yet  currently  used  in  industrial 
codes.  At  the  same  time,  new  developments  in 
computer  hardware  and  architectures  have  led 
to  significant  advances  in  parallel  computing 
and  multiprocessing." 

It  must  also  be  stated  that  despite  the  recent  advances 
CFD  still  suffers  from  deficiencies  in  accuracy,  robust¬ 
ness  and  efficiency  for  complex  applications,  such  as 


complete  aircraft  flow  predictions.  From  the  aeronauti¬ 
cal  industry's  point  of  view,  CFD  is  expected  to  deliver: 

-  detailed  viscous  flow  analysis  for  complex  geome¬ 
tries  at  realistic  Reynolds  numbers 

-  accurate  prediction  of  aerodynamic  data 

-  fast  response  time  per  flow  case  at  acceptable  total 
costs 

-  aerodynamic  optimization  of  aircraft  compo¬ 
nents/complete  aircraft 

-  interdisciplinary  analysis  of  aircraft  (aerodynamics 
-I-  structure  +  flight  control) 

In  order  to  meet  these  requirements,  improvements  in 
CFD  are  needed  in  all  areas. 

Based  on  this,  the  aim  and  scope  of  the  symposium  were 
set  by  the  program  committee  in  the  call  for  papers  as 
follows: 

’’The  symposium  will  focus  on  those  topics 
which  are  likely  to  constitute  pacing  items  and 
new  challenges  in  CFD.  Its  aim  is  to  bring  to¬ 
gether  scientists  and  engineers  working  on  new 
numerical  developments  in  different  fields  of 
interest  to  the  aerospace  sciences  and  industrial 
communities. 

Papers  may  address  a  broad  range  of  research 
fields  of  current  interest.  A  list  of  possible  top¬ 
ics  includes  (but  is  not  limited  to)  the  follow¬ 
ing: 

•  Unstructured  grid,  hybrid,  adaptive,  multi¬ 
block  and  grid  embedding  methods  and  algo¬ 
rithms 

•  Implicit  and  iterative  methods  for  Euler  and 
Navier-Stokes  equations,  fast  iterative  solv¬ 
ers  (multi-grid,  Krylov  subspace  techniques) 

•  Numerical  techniques  for  parallel  computing 
and  multiprocessing 
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•  Advances  in  accurate  capturing  techniques 
for  shock  waves  and  contact  discontinuities, 
TVD  high  resolution  schemes,  multidimen¬ 
sional  upwinding 

•  Numerical  algorithms  and  problems  specifi¬ 
cally  related  to  the  implementation  of  turbu¬ 
lence  models  and  to  the  simulation  of 
nonequilibrium  chemically  reacting  flows 

•  Numerical  accuracy  assessment. 

In  order  to  limit  the  scope  of  the  symposium, 
papers  essentially  devoted  to  grid  generation 
techniques  and  turbulence  or  chemistry  model¬ 
ling  are  not  encouraged." 

The  symposium  spanned  three  and  one-half  days  and 
the  program  listed  38  technical  papers  coming  from  13 
countries,  of  which  36  papers  were  presented.  The  pro¬ 
gram  committee  organized  a  keynote  session,  eight  ma¬ 
jor  sessions  and  a  general  discussion  at  the  end  of  the 
meeting.  The  following  table  presents  the  topics  covered 
by  the  symposium.  Although  many  papers  addressed 
several  topics,  they  are  categorized  in  this  table  based  on 
their  central  focus.  The  papers  are  listed  in  chapter  4  in 
the  order  of  their  presentation. 


topie 

reference 

invited  papers 

[1],  [2],  [3] 

parallel  computing 

[4],  [5],  [6],  [7],  [21], 

[25],  [36] 

advanced  spatial 
discretization  schemes 

[9],  [10],  [12],  [14],  [15], 
[16],  [17] 

unstructured  grids, 
hybrid  grids,  overlap¬ 
ping  grids,  meshless 
techniques 

[8],  [11],,  [28],  [29] 

adaptive  schemes 

[13],  [20],  [24],  [34] 

fast  implicit  and 
iterative  solvers 

[18],  [19],  [26] 

turbulent  flows, 

LES  /  DNS 

[22],  [23] 

chemically  reacting 
flows 

[30],  [31] 

unsteady  flows 

[[27],  32],  [35],  [33] 

It  is  worthwhile  to  note  that  the  majority  of  the  papers 
were  concerned  with  parallelization  of  CFD  methods, 
use  of  more  flexible  grid  structures  and  development  of 
advanced  discretization  schemes  including  adaptive 
methods.  This  may  reflect  the  contemporary  trends  of 
CFD  research  in  most  aeronautical  companies,  gover- 
ment  research  laboratories  and  universities.  Surpris¬ 
ingly,  except  for  the  keynote  paper  by  Jameson,  no  tech¬ 
nical  paper  addressed  optimization  and  interdisciplinary 
analysis  which,  in  the  author's  opinion,  are  major  chal¬ 
lenges  in  CFD. 

The  evaluation  undertaken  in  this  report  attempts  to 
cover  two  aspects.  Chapeter  2  comprises  summaries  of 
the  presented  papers  for  each  topic  given  in  table  1 .  It  is 
not  intended  to  give  an  extensive  review  of  all  individ¬ 
ual  papers,  but  instead,  for  each  particular  subject  it  is 
aimed  to  identify  the  critical  issues  and  to  assess  how  far 
they  were  addressed  by  the  papers.  In  chapter  3,  con¬ 
cluding  remarks  are  presented  indicating  to  what  degree 
the  aims  of  the  meeting  and  the  needs  of  the  aerospace 
community  were  met.  Furthermore,  recommendations 
arising  from  the  meeting  are  given. 

2.  SYNOPSIS  OF  THE  PAPERS 

With  respect  to  the  theme  of  the  meeting  "Progress  and 
Challenges  in  CFD  Methods  and  Algorithms",  in  the 
evaluator's  opinion,  many  papers  of  high  quality  were 
given,  which  represent  the  current  status  of  CFD,  focus 
on  unresolved  issues  and  present  new  important  direc¬ 
tions  of  development  to  overcome  current  deficiencies. 
On  the  other  hand,  many  papers  of  lower  quality  were 
presented.  Some  of  them  did  not  meet  the  main  focus  of 
the  symposium,  several  others  did  not  reflect  the  present 
status  of  CFD  or  were  largely  redoing  or  reinventing 
well  established  topics  that  have  been  known  in  the  lit¬ 
erature  for  some  time. 

2.1  Invited  Papers 

Keynote  papers  were  provided  by  A.  Jameson,  R  Rub- 
bert  and  D.  Knight. 

Jameson  [1]  gave  an  excellent  overview  of  present  sta¬ 
tus,  challenges  and  future  developments  in  computa¬ 
tional  fluid  dynamics.  He  addressed  the  essential  re¬ 
quirements  on  numerical  simulation  for  their  effective 
industrial  use.  Assured  accuracy,  acceptable  computa¬ 
tional  and  human  costs  as  well  as  fast  turn  around  were 
identified  as  major  issues.  In  his  opinion,  more  sophisti¬ 
cated  algorithms  are  required  in  order  to  substanially  re¬ 
duce  computational  costs.  Improved  methods  should  in¬ 
clude  higher  order  schemes,  advanced  acceleration 
methods,  fast  inversion  methods  for  implicit  schemes 
and  the  effective  exploitation  of  massively  parallel  com¬ 
puters.  The  paper  reviewed  modern  numerical  methods 
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and  addressed  several  issues  in  algorithm  design.  In  par¬ 
ticular,  a  unified  approach  to  design  accurate  and  effi¬ 
cient  shock  capturing  algorithms  was  presented.  Some 
examples  of  state-of-the-art  calculations,  which  can  be 
performed  in  an  industrial  environment,  were  given. 
Jameson  pointed  out  that  beside  the  transition  to  more 
sophisticated  algorithms,  the  present  challenge  is  to  ex¬ 
tend  the  effective  use  of  CFD  techniques  to  more  com¬ 
plex  applications.  As  key  problems,  he  identified  turbu¬ 
lent  flows  at  Reynolds  numbers  associated  with  full 
scale  flight,  chemically  reacting  flows,  combustion  and 
unsteady  flows.  Furthermore,  multidisciplinary  analysis, 
aerodynamic  shape  optimization  and  in  the  long  run 
multidisciplinary  optimization  were  designated  as  im¬ 
portant  future  target  areas  of  CFD.  In  his  presentation, 
Jameson  outlined  a  very  promising  technique  for  effi¬ 
cient  three-dimensional  shape  optimization  based  on 
control  theory.  He  demonstrated  a  successful  design  of  a 
swept  wing  with  very  low  wave  drag  within  40  design 
iterations.  In  this  example,  the  flow  was  modeled  by  the 
Euler  equations.  He  mentioned  that  with  this  technique, 
even  in  the  case  of  three-dimensional  flows,  the  compu¬ 
tational  requirements  are  so  moderate  that  the  calcula¬ 
tions  can  be  performed  with  workstations  such  as  the 
IBM  RISC  6000  series. 

In  summary,  the  invited  paper  delivered  by  Jameson 
gave  a  precise  outline  of  the  scope  of  the  symposium 
and  the  expected  outcome  of  the  meeting. 

In  his  presentation  [2],  Rubbert  focused  his  remarks  on 
challenges  and  pacing  items  in  CFD  that  extend  beyond 
the  technical  ones.  He  pointed  out  that  the  key  to  devel¬ 
oping  better  airplanes  or  better  CFD  is  the  same,  namely 
to  analyze,  understand  and  improve  the  processes  by 
which  airplanes  or  CFD  are  created.  Rubbert  called  the 
process  by  which  CFD  capabilities  are  created  the  re¬ 
search  engine.  Such  a  research  engine  involves  industry, 
academia  and  government,  and  the  three  components  in¬ 
teract  with  each  other  as  a  system.  In  the  past  this  sys¬ 
tem  functioned  quite  well,  but,  in  his  opinion,  it  has 
been  almost  disconnected  from  the  customers  of  CFD 
research,  namely  the  practicing  design  engineers.  Im¬ 
pressive  results  of  research  have  been  achieved,  but  they 
were  not  necessarily  applicable  by  industry.  The  paper 
pointed  out  many  principal  characteristics  and  attributes 
which  an  improved,  properly  functioning  research  en¬ 
gine  should  have.  The  leading  principles  are  customer 
focus  and  customer  satisfaction.  Two  further  key  factors 
were  identified  which  will  pace  the  change  of  the  re¬ 
search  engine.  The  first  is  a  two-way,  more  intensive 
communication  between  the  research  community  and 
the  engineering  community  in  industry.  The  second  is  a 
modification  of  the  evaluation  system  of  the  research 


work  towards  more  industrial  applicability.  As  stated  by 
Rubbert,  this  is  the  responsibility  of  the  money  givers 
who  inhabit  the  research  engine. 

In  summary,  Rubbert‘s  paper  performed  a  general  criti¬ 
cal  assessment  of  today's  system  of  research  and  its 
stage  of  change.  His  observations  represent  the  prag¬ 
matic  point  of  view  of  industry,  from  which  the  interest 
of  researcher's  basic  scientific  findings  are  less  empha¬ 
sized.  This  paper  makes  CFD  researchers  sensitive  to  in¬ 
dustrial  needs,  but  some  specific  views  of  aeronautical 
industry  on  the  status  of  CFD  and  future  requirements 
would  have  been  desirable. 

The  paper  by  Knight  [3]  presented  an  overview  of  paral¬ 
lel  computing  in  computational  fluid  dynamics.  In  the 
first  part  of  the  paper  the  basics  of  parallel  computing 
were  addressed,  including  the  introduction  of  the  dis¬ 
tinct  levels  of  parallelism,  the  classification  of  parallel 
computer  architectures  and  the  description  of  the  two 
basic  programming  paradigms,  namely  message  passing 
and  data  parallelism.  The  second  part  focused  on  several 
key  issues  in  the  context  of  code  development  for  paral¬ 
lel  computing.  Dynamic  load  balancing  and  scalability 
were  identified  as  critical  issues  for  complex  CFD  appli¬ 
cations  carried  out  on  massively  parallel  computers. 
Furthermore,  a  major  concern  of  parallel  computing  is 
portability.  Here,  Knight  discussed  current  research  ac¬ 
tivities,  including  the  development  of  message  passing 
standards  (e.g.  PVM,  MPI)  and  data  parallel  program¬ 
ming  language  standards  (e.g.  HPF).  In  his  presentation, 
Knight  pointed  out  that  in  the  U.S.  aerospace  industry 
has  taken  a  leading  role  in  the  application  of  parallel 
computing  to  practical  analysis  and  design.  In  the  past 
few  years  several  major  aerospace  corporations  have  de¬ 
veloped  extensive  networks  of  workstations  for  routine 
applications.  Several  examples  were  given  in  the  paper. 
Knight's  presentation  provided  a  basic  introduction  into 
the  field  of  parallel  computing.  The  fundamental  termi¬ 
nology  were  explained  and  all  critical  issues  were  dis¬ 
cussed.  Therefore,  the  paper  was  very  helpful  for  the  un¬ 
derstanding  and  assessment  of  the  following  technical 
papers  which  dealt  with  parallelization.  Unfortunately, 
the  paper  did  not  discuss  the  potentials  and  limitations 
of  high  performance  parallel  computers  to  tackle  large 
scale  applications  or  new  challenges  in  CFD.  Results 
were  only  presented  for  networks  of  loosely  coupled 
workstations. 

Although  papers  on  grid  generation  techniques  were  ex¬ 
plicitly  not  encouraged  by  the  call  for  papers,  an  invited 
paper  on  status  and  progress  of  both  structured  and  un¬ 
structured  grid  generation  for  complex  configurations 
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might  have  been  desirable.  The  turn  around  time  and  ac¬ 
curacy  of  the  numerical  simulation  of  industrial  applica¬ 
tions  very  often  depend  on  the  capability  of  the  avail¬ 
able  grid  generation  procedure.  Therefore,  for  the 
critical  assessment  of  numerical  algorithms  using  struc¬ 
tured,  unstructured  or  hybrid  meshes,  the  capabilities 
and  limits  of  the  underlying  grid  generation  techniques 
have  to  be  taken  into  account. 

2,2  Parallel  Computing 

Parallel  computing  is  an  important  means  to  cut  down 
turn  around  time  and  computational  costs  of  large  scale 
applications.  Furthermore,  it  is  believed  that  the  exploi¬ 
tation  of  massively  parallel  computing  is  the  key  to 
tackle  new  grand  challenges  in  CFD  such  as  multidisci¬ 
plinary  analysis  and  optimization.  In  the  last  several 
years  a  wide  variety  of  parallel  architectures  have  be¬ 
come  available  which  differ  in  the  design  of  the  CPU's 
(vector  versus  RISC  processor),  the  memory  organiza¬ 
tion  (e.g.  shared  versus  distributed  memory)  and  the 
communication  system  (hardware  and  software).  For  the 
future  some  of  the  vendors  promise  substantial  increase 
of  computational  power  in  both  memory  and  CPU.  One 
of  the  main  issues  in  parallel  computing  is  the  design  of 
numerical  algorithms  which  efficiently  exploit  the  capa¬ 
bilities  of  the  parallel  hardware.  Especially  in  the  case  of 
distributed  memory  machines,  this  is  a  non  trivial  task. 
The  important  aspects  in  designing  parallel  algorithms 
for  these  architectures  are  pardoning  of  data  and  compu¬ 
tation  among  the  processors,  communication  at  the  in¬ 
ternal  boundaries,  load  balancing  and  overhead  due  to 
communication  and  extra  computations.  Simpler  algo¬ 
rithms,  such  as  explicit  schemes,  parallelize  quite  easily 
and  they  lead  to  high  performance  on  most  parallel  com¬ 
puters.  However,  due  to  their  poor  convergence  rates 
they  are  overall  much  less  efficient  than  implicit 
schemes,  although  the  latter  ones  generally  perform  far 
below  the  peak  of  the  parallel  machines  due  to  the  more 
intensive  and  more  global  communication  involved.  The 
adjustment  and  further  development  of  more  sophisti¬ 
cated  algorithms  such  as  multigrid  and  domain  decom¬ 
position  methods  on  parallel  architectures  are  very 
promising.  In  contrast  to  explicit  schemes,  they  provide 
global  distribution  of  information,  however  in  a  much 
more  efficient  way  than  traditional  implicit  schemes. 
Further  research  in  this  direction  is  needed  in  order  to 
efficiently  exploit  the  capabilities  of  parallel  computers. 

In  this  symposium,  papers  [4],  [5],  [6],  [7],  [21],  [25] 
and  [36]  dealt  mainly  with  parallel  computing  and  cov¬ 
ered  various  aspects  thereof.  The  paper  by  Eisfeld  et  al. 
[4]  stressed  the  issue  of  portability.  They  described  the 
portable  parallelization  of  a  state-of-the-art  block-struc¬ 


tured  multigrid  solver  for  industrial  CFD  applications. 
Portability  is  achieved  through  the  use  of  a  message 
passing  based  high  level  communication  library.  This  li¬ 
brary  supports  any  operation  which  is  necessary  in  par¬ 
allel  mode  and  involves  communication  between  differ¬ 
ent  processes.  Performance  measurements  on  a  large 
variety  of  computers  of  different  architectures  demon¬ 
strated  the  comprehensive  portability  of  the  code.  Appli¬ 
cations  included  inviscid  computations  for  a  generic  air¬ 
craft  consisting  of  wing/body/pylon/engine  and  viscous 
calculations  for  a  wing-body  configuration  on  a  compu¬ 
tational  mesh  with  6.6  million  points.  The  paper  showed 
that  the  complexity  of  today's  problems  in  applied  aero¬ 
dynamics  can  be  tackled  with  parallel  computers.  It  also 
revealed  the  necessity  for  an  automatic  and  effective 
load  balancing  tool  that  allows  the  mapping  of  an  initial 
block  structure  to  a  higher  number  of  processors  than 
given  blocks.  Details  on  the  parallel  efficiency  of  the 
multigrid  method  used  in  the  applications  were  not 
given. 

The  papers  by  Wissink  et  al.  [6],  Dias  d‘ Almeida  et  al. 
[7]  and  Badcock  et  al.  [36]  focused  on  the  parallel  im¬ 
plementation  of  implicit  Euler/Navier-Stokes  solvers.  In 
[6]  for  example,  two  modifications  of  the  well  known 
implicit  LU-SGS  scheme  (Lower-Upper  Symmetric 
Gauss-Seidel)  were  presented.  The  first  replaces  the 
Gauss-Seidel  sweep  in  LU-SGS  with  a  Jacobi-like 
sweep  which  only  requires  nearest  neighbor  communi¬ 
cation  and  is  therefore  easy  to  parallelize.  The  second 
one  is  a  hybrid  approach  that  couples  the  global  Jacobi 
type  communication  with  the  more  efficient  Gauss- 
Seidel  sweep  on  each  subdomain.  In  both  strategies 
multiple  sweeps  are  required  in  each  subdomain  in  order 
to  maintain  the  convergence  behavior  of  the  baseline 
LU-SGS  method.  Both  strategies  have  been  investigated 
in  detail  with  respect  to  parallel  speed-up,  convergence 
rate  and  computational  efficiency.  Inviscid  calculations 
for  3-D  hovering  helicopter  blades  demonstrated  that 
the  hybrid  strategy  is  a  promising  implicit  scheme  for 
parallel  computers  with  a  smaller  number  of  powerful 
processors. 

The  presentations  delivered  by  Pinelli  et  al.  [5]  and 
Streng  et  al.  [21]  addressed  the  parallelization  of  algo¬ 
rithms  for  DNS  and  LES.  In  [21]  the  various  aspects  of 
the  parallel  implementation  of  a  typical  higher  order 
DNS  solver  based  on  domain  decomposition  were  dis¬ 
cussed.  The  intrinsic  or  algorithmic  efficiency  has  been 
defined  (see  also  [4]),  which  deals  with  the  paralleliz- 
ability  of  a  given  algorithm,  regardless  of  the  machine. 
Based  on  some  analysis,  the  authors  showed  that  due  to 
extra  floating  point  operations  at  inner  block  boundaries 
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the  algorithmic  efficiency  decreases  rapidly  as  the  spa¬ 
tial  discretization  increases,  that  is,  as  the  corresponding 
stencil-size  grows.  Test  calculations  on  different  parallel 
architectures  indicated  that  the  machine  efficiency  is 
even  considerably  lower  than  the  algorithmic  efficiency. 
Furthermore,  the  paper  reported  on  first  experience  that 
has  been  gained  for  the  implementation  of  the  DNS 
solver  on  a  SGI  Power  Challenge  Array  (4  nodes  each 
comprised  of  a  16-CPU  shared  memory  parallel  ma¬ 
chine)  using  a  combination  of  fine-grained  (shared 
memory)  and  coarse-grained  parallelism  (explicit  mes¬ 
sage  passing).  The  results  were  very  promising,  how¬ 
ever,  this  parallelization  strategy  needs  further  investi¬ 
gation. 

The  paper  by  Sibilla  and  Vitaletti  [25]  did  not  show  any 
parallel  computations,  but  it  addressed  several  important 
aspects  of  multiblock-structured  grid  algorithms  in  a 
parallel  computing  environment.  As  in  [4]  the  manage¬ 
ment  of  data  communication  between  adjacent  blocks  is 
provided  by  a  parallel  library  (PARAGRID)  which  en¬ 
sures  that  the  same  average  values  are  assigned  to  all 
replicas  of  the  same  boundary  node  owned  by  different 
blocks.  The  paper  discussed  the  influence  of  block  sub¬ 
division  on  accuracy  and  efficiency  within  the  frame¬ 
work  of  a  multigrid  scheme.  The  solution  algorithm  has 
been  modified  in  order  to  account  for  the  presence  of  lo¬ 
cally  unstructured  topologies  at  block  boundaries  (sin¬ 
gular  points).  For  some  test  cases  it  could  be  demon¬ 
strated  that  the  convergence  of  the  numerical  method 
could  only  be  ensured  with  this  modification. 

In  conclusion,  most  of  the  papers  focused  on  some  spe¬ 
cific  algorithmic  aspects  of  parallel  computing.  Effort 
was  essentially  put  in  adjusting  sequential  algorithms 
rather  than  developing  new  parallel  schemes.  Only  a 
few  large  scale  CFD  applications  have  been  presented 
demonstrating  the  capabilities  and  limits  of  parallel  ar¬ 
chitectures  for  industrial  CFD  applications.  One  of  the 
main  challenges  for  parallel  complex  applications  is  the 
load  balanced  pardoning  of  the  flow  domain,  which  is 
essential  for  obtaining  optimal  machine  performance. 
This  important  issues  were  hardly  addressed  in  the  con¬ 
ference. 

2.3  Advanced  Spatial  Discretization  Schemes 

Although  in  the  last  decade  extensive  research  has  been 
ongoing  towards  the  development  of  accurate  Euler  and 
Navier-Stokes  solvers,  the  improvement  of  spatial  dis¬ 
cretization  schemes  is  still  a  major  concern  in  CFD. 
Suitable  discretization  schemes  are  expected  to  offer 
certain  properties.  These  are  conservation,  at  least  sec¬ 
ond  order  accuracy  in  smooth  flow  regions  and  sharp 


resolution  of  discontinuities  and  viscous  shear  layers. 
High  resolution  of  all  physical  phenomena  is  required 
on  a  computational  mesh  with  a  minimum  number  of 
grid  points.  Furthermore,  the  spatial  discretization 
should  support  a  robust  and  efficient  time  integration. 
Recently,  substantial  progress  has  been  made  in  this  area 
and  many  different  promising  approaches  for  the  im¬ 
proved  discretization  of  the  Euler  and  Navier-Stokes 
equations  are  known  in  the  literature.  Among  these  are 
e.g.  improved  shock  capturing  algorithms  based  on  flux 
difference  and  flux  vector  splitting,  multidimensional 
upwinding,  residual  distribution  schemes  and  kinetic 
flux  splitting.  These  methods  have  been  investigated  in 
detail  for  one  and  two-dimensional  flows  .  Very  often, 
however,  their  superiority  to  conventional  methods  have 
only  been  demonstrated  for  simple  test  cases.  Therefore, 
the  key  issue  remains  the  manifestation  of  the  improved 
abilities  of  the  advanced  methods  for  relevant  2-D  and 
3-D  viscous  flows  around  more  complex  geometries. 

At  the  symposium  several  papers  [9],  [10],  [12],  [14], 
[15],  [16]  and  [17],  were  specifically  devoted  to  im¬ 
provements  of  the  spatial  discretization  of  Euler/Navier- 
Stokes  solvers.  The  paper  by  Delanaye  et  al.  [9]  pre¬ 
sented  the  development  of  a  new  quadratic  reconstruc¬ 
tion  finite  volume  scheme  for  unstructured  polygonal 
meshes.  The  most  frequently  employed  linear  cell  re¬ 
construction  of  the  flow  variables  requires  sufficiently 
regular  grids  for  second  order  accuracy  and  it  results  in 
a  first  order  scheme  for  irregular  meshes.  In  contrast  to 
this,  the  proposed  quadratic  reconstruction  provides  a 
full  second  order  scheme  even  for  very  irregular 
meshes.  In  order  to  avoid  spurious  oscillations  in  the  vi¬ 
cinity  of  discontinuities,  the  quadratic  reconstruction  is 
switched  to  a  monotone  constant  one  with  the  help  of  a 
properly  defined  discontinuity  detector.  The  method  is 
designed  to  deal  with  adaptive  unstructured  grids  con¬ 
sisting  of  cells  with  an  arbitrary  number  of  edges.  Time 
integration  is  performed  by  an  implicit  scheme  based  on 
Newton-Krylov  techniques.  The  efficiency  and  high  ac¬ 
curacy  of  the  numerical  method  were  demonstrated  for 
various  2-D  inviscid  and  viscous  laminar  computations 
including  test  cases  with  locally  distorted  meshes.  How¬ 
ever,  the  method  needs  to  be  carefully  investigated  for 
3-D  complex  geometries  and  turbulent  flows  at  high 
Reynolds  numbers  where  highly  irregular  meshes  are 
expected.  Furthermore,  the  sensitivity  of  the  quadratic 
reconstruction  with  respect  to  implementation  of  wall 
boundary  conditions  has  to  be  investigated. 

The  paper  delivered  by  Villedieu  et  al.  [10]  presented  a 
second  order  scheme  based  on  kinetic  flux  splitting.  The 
main  feature  of  this  approach  is  that  under  a  CFL  like 
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condition  density  and  energy  can  be  proved  to  remain 
nonnegative.  This  makes  the  method  very  attractive  for 
predictions  of  flow  fields  with  near  vacuum  conditions, 
such  as  flows  around  hypersonic  vehicles.  Promising  re¬ 
sults  were  shown  for  2-D  supersonic  and  hypersonic  in- 
viscid  flows  in  comparison  with  the  classical  Roe  flux 
difference  split  scheme.  First  3-D  results  for  a  wing 
alone  application  were  presented  which  do  not  yet  allow 
the  assessment  of  the  approach  for  3-D  more  complex 
application.  Furthermore,  detailed  remarks  on  the  con¬ 
vergence  behavior  of  the  method  were  missing. 

The  paper  by  Briggs  et  al.  [13]  addressed  the  effect  of 
certain  parameters  of  a  classical  TVD  scheme  on  the  so¬ 
lution  of  the  specific  viscous  flow  problem  of  a  trans¬ 
verse  jet  interacting  with  a  supersonic  flow.  As  already 
known  from  many  other  applications,  the  size  of  the  en¬ 
tropy  correction  parameter  and  the  choice  of  the  flux 
limiter  can  significantly  influence  the  accuracy  of  the 
viscous  solution,  especially  on  coarse  meshes  or  meshes 
with  improper  point  distribution  in  the  boundary  layer. 

The  papers  [14],  [15]  and  [16]  were  devoted  to  multidi¬ 
mensional  upwinding.  Vinckier  et  al.  [14]  presented  a 
so-called  flux  filter  scheme  which  operates  on  the  dis¬ 
crete  cell  flux  balance  and  assigns  filtered  parts  of  the 
residuals  to  the  corresponding  cell  vertices  according  to 
the  characteristie  propagation  directions.  For  stability 
reasons  some  kind  of  artificial  viscosity,  similar  to  the 
classical  central  scheme,  has  to  be  added,  which  is  an 
unwelcome  feature  in  the  framework  of  a  multidimen¬ 
sional  upwind  approach.  Results  for  2  D  inviscid  and 
viscous  laminar  flows  on  both  structured  and  unstrue- 
tured  meshes  were  presented  indicating  high  resolution 
of  flow  features  like  shocks  and  expansion  fans.  How¬ 
ever,  for  airfoil  flows  improved  accuraey  of  the  method 
compared  to  classical  central  or  upwind  schemes  was 
not  demonstrated. 

The  paper  by  Faille  re  et  al.  [15]  reviewed  recent  devel¬ 
opments  in  multidimensional  upwind  schemes  based  on 
the  residual  decomposition  or  fluctuation  splitting  ap¬ 
proach.  Substantial  progress  has  been  made  in  the  im¬ 
plementation  of  truly  multidimensional  upwinding  in 
which  unlike  the  standard  upwind  schemes  the  upwind 
biasing  is  determined  by  properties  of  the  physics  rather 
than  the  computational  mesh.  For  scalar  conservation 
laws  various  advection  schemes  distributing  the  conser¬ 
vative  flux  balance  to  only  the  downstream  nodes  have 
been  developed.  These  schemes  can  be  designed  such 
that  properties  as  conservation,  positivity  and  second  or¬ 
der  accuracy  are  guaranteed.  It  was  reported  that  the  ex¬ 


tension  of  these  schemes  to  Euler/Navier-Stokes  equa¬ 
tions  is  straight  forward  provided  that  a  conservative 
linearization  can  be  found.  This  can  easily  be  achieved 
for  triangular  meshes,  whereas  for  quadrilateral  meshes 
it  is  more  difficult  and  still  subject  of  ongoing  research. 
The  paper  presented  various  numerical  examples  for  2- 
D  flows  demonstrating  the  ability  of  the  residual  decom¬ 
position  approach.  In  particular,  the  results  indicate  the 
improved  resolution  of  flow  disontinuities  which  are  not 
aligned  with  mesh  lines.  Unfortunately,  the  issue  of  ac¬ 
curate  prediction  of  turbulent  viscous  flows  was  not  ad¬ 
dressed  in  the  paper.  Furthermore,  no  3-D  results  were 
shown.  The  residual  decomposition  schemes  have  been 
successfully  combined  with  implicit  methods  and  solu¬ 
tion  adaptive  techniques. 

The  paper  by  Van  Ransbeeck  and  Hirsch  [  1 6]  presented 
an  alternative  approach  for  multidimensional  upwind 
schemes  on  structured  meshes.  In  this  framework  the 
numerical  flux  is  formulated  using  the  artificial  dissipa¬ 
tion  concept.  The  diffusive  contribution  is  constructed 
with  directional  terms,  whereas  the  antidiffusive  term  is 
designed  according  to  the  direction  of  the  convection 
speed  and  to  variations  of  the  solution  in  different  mesh 
directions.  The  paper  presented  a  classification  of  first 
and  second  order  accurate  schemes  that  have  respec¬ 
tively  minimum  and  zero  cross  diffusion.  Second  order 
monotone  schemes  have  been  developed  using  the  con¬ 
cept  of  non-linear  limiter  functions  applied  to  multidi¬ 
mensional  ratios  of  flux  differences.  Extensions  of  the 
scalar  dissipation  model  to  the  Euler/Navier-Stokes 
equations  have  been  achieved  through  a  characteristic 
decomposition.  Different  choices  for  the  propagation  di¬ 
rection  are  possible.  Promising  results  were  presented 
for  2-D  and  3-D  supersonic  test  cases  showing  compara¬ 
ble  or  somewhat  improved  accuracy  with  respect  to 
classical  second  order  dimensional-split  upwind 
schemes.  However,  no  results  for  subsonic  test  cases 
such  as  flow  past  an  airfoil  were  shown.  Therefore,  a 
comprehensive  assessment  of  the  concept  is  not  possible 
at  the  current  stage  of  research. 

In  summary,  promising  approaches  were  presented, 
which  aimed  at  improving  the  accuracy  of  state-of-the- 
art  Euler/Navier-Stokes  solvers.  In  particular,  the  higher 
order  reconstruction  approach  and  the  multidimensional 
upwind  schemes  offer  properties  which  in  theory  make 
them  superior  to  standard  algorithms.  Numerical  results 
for  various  two  dimensional  test  problems  support  this. 
However,  in  order  to  push  the  implementation  of  these 
advanced  techniques  into  3-D  production  codes  for  vis¬ 
cous  flow  calculations,  further  investigations  are  re¬ 
quired.  A  critical  assessment  should  include  sensitivity 
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studies  with  respect  to  grid  fineness  and  grid  regularity 
for  transonic  2-D  and  3-D  viscous  flows.  It  should  be 
clarified  whether  with  these  new  concepts  substantial 
progress  can  be  made  towards  accurate  drag  prediction 
for  2-D  and  3-D  configurations  at  relevant  Reynolds 
numbers. 

2.4  Unstructured  Grids,  Hybrid  Grids,  Overlapping 
Grids  and  Meshless  Techniques 

The  key  problem  of  numerical  simulation  of  complex 
configurations  is  the  construction  of  an  appropriate  grid 
to  represent  the  computational  domain  of  interest.  Grid 
generation  is  the  decisive  factor  concerning  the  turn 
around  time  of  simulations  for  industrial  applications. 
Essentially  two  alternative  strategies  exist,  namely 
structured  and  unstructured  meshes.  Currently,  block- 
structured  body-fitted  meshes  are  most  widely  used. 
They  have  been  proved  to  be  well  suited  for  viscous  cal¬ 
culations  and  they  form  the  building  blocks  of  most  of 
the  industrial  state-of-the-art  production  codes.  How¬ 
ever,  with  this  approach,  grid  generation  for  complex 
geometries  itself  is  the  major  challenge.  Various  strate¬ 
gies  are  being  developed  to  simplify  the  grid  generation 
problems.  Among  these  are  the  overlapping  grid  tech¬ 
niques  where  the  structured  grids  of  various  blocks  may 
overlap.  The  alternative  approach  is  to  divide  the  com¬ 
putational  domain  into  an  unstructured  assembly  of 
computational  cells  by  using  tetrahedra  or  general  po¬ 
lygonal  volumes.  In  contrast  to  structured  meshes,  this 
strategy  substantially  simplifies  the  discretization  of 
complex  geometries.  On  the  other  hand,  it  complicates 
the  design  of  accurate  and  efficient  algorithm.  While  in 
the  past  promising  and  flexible  unstructured  methodolo¬ 
gies  have  been  developed  for  inviscid  flows,  the  accu¬ 
rate  calculation  of  viscous  flows  using  unstructured 
meshes  is  still  an  important  issue  of  current  research.  In 
particular,  efficient  simulation  of  high  Reynolds  number 
flows  requires  extremely  stretched  cells,  which  in  the 
case  of  tetrahedral  meshes  lead  to  tetrahedra  with  acute 
angles.  This  may  cause  numerical  errors,  at  least  for 
classical  schemes  currently  used  in  industrial  codes.  An 
interesting  alternative  is  the  use  of  hybrid  grids  consist¬ 
ing  of  tetrahedra  and  hexahedral  or  prismatic  cells.  It  of¬ 
fers  the  possibility  of  combining  the  flexibility  of  tetra¬ 
hedral  meshes  with  the  accuracy  of  regular  grids  in  the 
boundary  layer.  The  ability  of  this  approach  to  simulate 
turbulent  flows  around  complex  3-D  geometries  is  still 
to  be  verified. 

Some  of  the  before  mentioned  issues  concerning  the  use 
of  more  flexible  grid  structures  were  addressed  during 
the  meeting.  The  paper  delivered  by  Ramakrishnan  et 
al.  [8]  presented  the  experience  on  unstructured  grid 


computations  gained  at  Rockwell  Science  Center  over 
the  past  several  years.  One  of  the  most  important  lessons 
they  have  learned  from  many  3-D  applications  is  the 
fact  that  in  spite  of  all  the  advances  that  have  been  made 
in  the  field  of  unstructured  procedures,  on  comparable 
grid  fineness  structured-grid  simulations  yield  more  ac¬ 
curate  solutions.  The  authors  concluded  that  for  inviscid 
flows  unstructured  Euler  solvers  have  a  clear  edge  over 
their  structured  counterparts.  This  is  due  to  the  fact  that 
the  solution  of  Euler  equations,  unlike  Navier-Stokes 
equations,  does  not  require  very  fine  meshes  in  the  vi¬ 
cinity  of  solid  bodies.  Therefore,  unstructured  grid  gen¬ 
eration  becomes  much  easier  to  handle  and  several  com¬ 
putations  for  many  different  configurations  can  be 
carried  out  in  a  matter  of  a  few  weeks.  In  the  case  of  vis¬ 
cous  flows,  however,  the  stringent  resolution  require¬ 
ments  in  the  wall  normal  direction  makes  structured 
solvers  more  suitable  for  efficient  calculations.  In  the 
framework  of  unstructured  meshes,  paper  [8]  presented 
a  generalization  of  the  implementation  of  boundary  con¬ 
ditions  which  allows  the  specification  of  interior  bound¬ 
aries  anywhere  in  the  computational  domain.  This  con¬ 
cept  allows  the  effective  computation  of  moving  bodies, 
like  in  the  case  of  aircraft  store  release.  However,  no  de¬ 
tailed  numerical  results  were  shown. 

The  paper  by  Galle  [28]  addressed  the  solution  of  Euler 
and  Navier-Stokes  equations  on  hybrid  grids  consisting 
of  prismatic  cells  near  the  body  surface  and  tetrahedral 
cells  elsewhere.  The  use  of  prismatic  cells  offers  the 
possibility  to  efficiently  and  accurately  resolve  regions 
such  as  boundary  layers  by  applying  high  aspect  ratio 
cells  in  the  respective  areas.  An  upwind  finite  volume 
scheme  has  been  implemented  on  an  auxiliary  mesh  of 
control  volumes.  This  dual  mesh  formulation  guarantees 
conservation  in  the  entire  flow  field  and  in  particular  at 
interfaces  between  prismatic  and  tetrahedral  domains. 
The  integration  in  time  is  performed  by  an  explicit  mul¬ 
tistage  scheme  accelerated  by  a  multigrid  technique 
based  on  agglomeration  of  control  volumes.  Promising 
numerical  results  were  shown  for  3-D  inviscid  and  2-D 
viscous  flows  demonstrating  the  ability  of  the  method. 
However,  further  3-D  viscous  calculations  for  more 
complex  geometries  are  required  to  proof  the  concept  of 
the  hybrid  mesh  approach. 

The  paper  delivered  by  Brenner  [29]  presented  a  com¬ 
putational  procedure  to  simulate  rocket  stage  separation. 
The  Euler  equations  with  mixing  gases  are  solved  with 
an  upwind  finite  volume  method  on  unstructured 
meshes,  which  may  consist  of  a  combination  of  tetrahe¬ 
dral,  prismatic  and  hexahedral  cells.  In  order  to  simulate 
the  motion  of  bodies,  a  conservative  overlapping  grid 
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technique  have  been  implemented.  A  temporal  adaptive 
algorithm  is  used  to  calculate  the  unsteady  flow  field.  A 
very  impressive  application  was  shown,  which  however 
did  not  allow  a  critical  assessment  of  the  method  con¬ 
cerning  its  accuracy. 

The  interesting  concept  of  meshless  simulation  tech¬ 
niques  for  fluid  flow  problems  was  presented  by  Onate 
et  al.  [11].  According  to  the  work  of  Batina  the  discrete 
approximation  of  the  governing  equations  uses  a  cloud 
of  arbitrary  points.  Unlike  conventional  meshes,  no 
fixed  connectivities  between  the  points  is  needed  and 
therefore  grid  properties  like  regular  cells  or  non  nega¬ 
tive  cell  volumes  are  not  required.  A  weighted  least 
square  interpolation  is  used  to  construct  a  linear  or  qua¬ 
dratic  function  from  the  values  given  at  the  arbitrary 
points  in  the  local  interpolating  domain.  First  examples 
for  the  solution  of  the  1-D  convection  diffusion  equation 
and  for  2-D  compressible  inviscid  flows  were  shown. 
For  these  calculations  the  points  generated  by  an  un¬ 
structured  mesh  have  been  used.  It  was  pointed  out  that 
major  difficulties  of  this  approach  are  the  definition  of 
the  local  interpolating  domain  and  the  selection  of  the 
most  significant  points  for  the  interpolation  in  each  do¬ 
main.  The  interpolation  strategy  strongly  influences  the 
quality  of  the  solution.  Another  drawback  is  that  the 
method  is  not  conservative.  Furthermore,  it  is  quite  dif¬ 
ficult  to  access  the  accuracy  of  the  numerical  procedure 
if  a  set  of  arbitrary  points  is  used.  However,  since  in  the¬ 
ory  the  meshless  approach  does  not  require  a  suitable 
grid  of  high  quality  and  allows  an  efficient  adaption 
strategy,  further  research  in  this  area  is  encouraged. 

The  papers  reviewed  above  addressed  different  and  in 
comparison  to  structured  meshes  more  sophisticated 
grid  strategies  which  are  expected  to  improve  or  even 
enable  the  simulation  of  3-D  complex  configurations. 
Promising  results  for  various,  mostly  inviscid  test  cases 
were  shown.  However,  the  abilities  of  the  advanced 
techniques  to  accurately  calculate  turbulent  viscous 
flows  around  more  complex  geometries  were  not  dem¬ 
onstrated  at  the  conference. 

2.5  Adaptive  Schemes 

In  recent  years  adaptive  grid  methods  for  computational 
fluid  dynamics  have  gained  popularity  due  to  their  po¬ 
tential  to  provide  highly  accurate  solutions  on  the  basis 
of  cost-effective  calculations.  In  contrast  to  global  re¬ 
duction  of  the  mesh  interval,  very  fine  mesh  cells  are  re¬ 
stricted  to  those  regions  where  flow  features  need  high 
grid  resolution;  elsewhere  the  computational  grid  may 
be  quite  coarse.  Grid  adaption  methods  can  be  catego¬ 
rized  into  either  point  redistribution  or  mesh  embed¬ 


ding/enrichment.  Point  redistribution  schemes  maintain 
a  constant  number  of  points,  which  are  moved  such  that 
they  congregate  near  flow  features.  This  technique  can 
be  easily  implemented  into  existing  structured  and  un¬ 
structured  flow  solvers.  However,  it  can  lead  to  quite 
skewed  grids,  especially  in  the  case  of  structured 
meshes.  The  grid  embedding  technique  add  points  to  the 
existing  grid.  This  procedure  maintains  the  global  grid 
accuracy  outside  embedded  regions  and  simultaneously 
increases  the  accuracy  in  the  embedded  regions.  The 
key  issue  for  adaptive  methods  is  the  design  of  suitable 
error  estimators.  By  far  the  most  common  approach  is  to 
use  physical  features  such  as  local  solution  gradients. 
These  indicators  efficiently  detect  high-gradient  regions 
such  as  shock  waves,  however  the  global  error  may  not 
necessarily  be  reduced  and  the  numerical  solution  may 
depend  on  the  adaption  pattern.  Recently  more  ad¬ 
vanced,  direct  error  estimators  are  used.  They  are  either 
based  on  the  discretization  error,  which  may  be  esti¬ 
mated  by  comparing  quantities  calculated  on  two  differ¬ 
ent  fine  meshes,  or  on  the  residual  error.  This  strategy  is 
very  promising  but  needs  further  research,  especially  if 
it  is  applied  to  viscous  flows.  In  conclusion,  adaptive 
strategies  are  considered  as  one  of  the  pacing  items  of 
algorithmic  research.  Issues  which  have  to  be  clarified 
for  complex  applications  are  the  development  of  suit¬ 
able  adaption  criteria  allowing  grid  independent  solu¬ 
tions  and  dynamic  load  balancing  for  parallel  comput¬ 
ing. 

The  issue  of  adaptivity  has  been  addressed  by  many 
conference  papers.  Papers  [9],  [15]  and  [34]  reported  on 
adaptive  refinement  in  the  context  of  unstructured  solv¬ 
ers  based  on  insertion  and  removal  of  grid  points.  The 
papers  [15]  and  [34]  presented  an  adaption  strategy 
which  relies  on  a  finite  element  error  estimator.  Whereas 
in  [15]  the  application  is  restricted  to  steady  2-D  invis¬ 
cid  flows,  Friedrich  et  al.  [34]  presented  a  dynamical 
adaption  for  various  2-D  unsteady  flows.  The  error  indi¬ 
cator,  which  has  been  proved  reliable  for  many  inviscid 
calculations,  is  currently  being  extended  to  the  Navier- 
Stokes  equations.  First  grid  adaptions  for  viscous  flows 
were  shown.  Furthermore,  the  finite  element  residual 
has  been  successfully  used  for  a  3-D  inviscid  wing  ap¬ 
plication.  The  adaptive  unstructured  solver  of  [34]  has 
been  parallelized  on  the  basis  of  an  intelligent  dynamic 
load  balancing  procedure  for  performance  controlled 
domain  decomposition.  In  the  parallel  mode  an  explicit 
time  integration  is  employed,  whereas  on  a  sequential 
computer  unsteady  calculations  are  carried  out  by  an 
implicit  method  using  a  preconditioned  GMRES  tech¬ 
nique. 
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The  paper  by  Becker  et  al.  [24]  addressed  the  adaptive 
grid  refinement  for  block  structured  solvers.  In  this  con¬ 
cept,  locally  refined  mesh  blocks  are  patched  into  the 
existing  mesh.  The  additional  fine  subblocks  are  con¬ 
nected  with  the  original  mesh  via  the  multigrid  tech¬ 
nique.  The  level  of  local  truncation  error  is  used  as  error 
indicator.  Following  the  idea  of  Brandt,  truncation  error 
estimates  can  be  extracted  directly  from  the  multigrid 
cycle.  So  far,  the  refinement  procedure  is  set  up  outside 
the  flow  solver.  First  results  presented  for  2-D  and  3-D 
inviscid  and  viscous  test  cases  show  the  feasibility  of 
the  strategy  of  subblock  refinement.  However,  consider¬ 
able  more  work  is  required  to  establish  a  fully  auto¬ 
matic,  robust  and  accurate  adaption  method. 

Van  der  Vegt  et  al.  [20]  presented  a  hexahedron  based 
grid  adaption  procedure.  The  method  uses  the  discontin¬ 
ues  Galerkin  finite  volume  formulation  with  local  grid 
enrichment.  A  directional  grid  adaption  is  employed 
which  allows  subdividing  of  cells,  independently  in 
each  of  the  three  local  grid  directions.  This  anisentropic 
grid  refinement  is  expected  to  be  more  efficient  in  cap¬ 
turing  local  flow  phenomena  than  isentropic  refinement, 
since  many  flow  features  are  one-dimensional.  The  sen¬ 
sor  uses  primitive  variables  and  is  constructed  such  that 
it  prevents  regions  with  discontinuities  from  constantly 
dominating  the  local  grid  refinement  procedure.  The  ca¬ 
pability  of  the  adaptive  method  was  demonstrated  by 
calculations  of  the  inviscid  transonic  flow  around  a  ge¬ 
neric  delta  wing.  From  the  author's  viewpoint,  the  hexa- 
hedral  based  adaptive  solver  is  a  good  candidate  for 
large  eddy  simulations  (LES),  because  it  offers  the  op¬ 
portunity  to  accurately  capture  viscous  sublayers  with 
successively  fine  grids  through  local  grid  refinement. 
LES  results,  however,  were  not  shown. 

Grid  adaptive  procedures  based  on  point  redistribution 
were  discussed  in  the  papers  [12],  [27],  [33].  This  tech¬ 
nique  was  mainly  used  in  the  framework  of  moving 
grids  for  unsteady  calculations. 

In  conclusion,  various  adaptive  strategies  were  pre¬ 
sented.  The  important  issue  of  developing  a  suitable  in¬ 
dicator  for  adaption  was  addressed.  Various  error  esti¬ 
mators  have  been  proposed  and  successfully  applied  to 
inviscid  flows.  However,  further  research  is  needed  to 
establish  efficient  and  robust  adaptive  methods  for  vis¬ 
cous  flows. 

2.6  Fast  Implicit  and  Iterative  Solvers 

As  numerical  flow  simulations  pave  their  way  into  the 
practical  aerodynamic  design  process,  the  need  for  effi¬ 


cient  algorithms  to  solve  the  spatial  discretized  Eu- 
ler/Navier-Stokes  equations  has  become  very  obvious. 
Many  solvers  still  used  in  current  aerospace  develop¬ 
ment  programs  exhibit  slow  convergence  towards  the 
desired  steady  state  solutions  which  leads  to  high  com¬ 
puter  costs  and  long  turn  around  times.  Consequently, 
there  is  a  substantial  amount  of  research  work  focused 
on  methods  for  convergence  acceleration.  Promising  ap¬ 
proaches  are  the  multigrid  time-stepping  technique  and 
the  Newton  iteration  with  fast  iterative  solvers.  In  struc¬ 
tured  codes  multigrid  techniques  based  on  explicit  mul¬ 
tistage  schemes  are  widely  used  and  they  have  been 
proved  to  yield  good  convergence  rates  for  many  practi¬ 
cal  applications.  However,  for  the  numerical  simulation 
of  high  Reynolds  number  flows,  the  convergence  of  the 
standard  multigrid  schemes  considerably  slows  down. 
This  is  due  to  the  stiffness  of  the  numerical  problem, 
which  is  introduced  through  the  high-aspect  ratio  cells 
required  for  the  efficient  solution  of  such  flow  fields. 
Therefore,  one  of  the  key  issues  concerning  algorithmic 
development  is  the  design  of  appropriate  multigrid  com¬ 
ponents,  such  as  smoothing  and  grid  transfer  operators, 
which  efficiently  tackle  high  aspect  ratio  cells. 

Interest  in  fast  iterative  methods  has  been  mainly  moti¬ 
vated  by  unstructured  solvers.  It  was  shown  that  cou¬ 
pling  Newton's  method  with  iterative  solvers  for  the  in¬ 
ner  iteration  is  an  effective  approach  for  solving  the 
large  systems  of  nonlinear  equations  arising  from  the 
discretization  of  Euler  and  Navier-Stokes  equations.  An 
interesting  feature  of  Newton's  method  is  its  ability  to 
provide  superlinear  asymptotie  convergence.  On  the 
other  hand,  efficient  iterative  schemes  based  on  New¬ 
ton's  iteration  require  excessive  memory  allocations  for 
three  dimensional  applications.  Therefore,  strategies 
have  to  be  developed  which  eliminate  the  large  storage 
requirements  but  still  remain  the  favorable  convergence 
characteristics  of  Newton's  method. 

The  paper  by  Pulliam  et  al.  [19]  gave  an  excellent  over¬ 
view  of  the  potentialities  and  drawbacks  of  Newton's 
method  applied  to  CFD  solvers.  For  practical  reasons,  in 
each  Newton  iteration  the  large  block  banded  matrix  is 
solved  by  an  iterative  matrix  solution  method.  In  partic¬ 
ular,  the  paper  addressed  the  class  of  Krylov  subspace 
methods  known  as  GMRES.  It  presented  practical  as¬ 
pects  and  implementation  issues  of  these  methods.  The 
main  components  of  the  Newton-GMRES  approaeh, 
such  as  evaluation  of  the  Jacobian,  matrix-vector  multi¬ 
ply  and  matrix  preconditioning,  were  discussed  with  re¬ 
spect  to  global  convergence  behavior,  memory  require¬ 
ments  and  accuracy.  Trade-offs  between  full  Newton 
and  approximate  Newton  and  other  pertinent  approxi¬ 
mations  were  investigated.  The  Newton-GMRES  solver 
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was  analyzed  in  the  framework  of  a  structured  and  un¬ 
structured  2-D  Navier-Stokes  code.  In  both  cases  very 
promising  results  were  shown.  Calculations  with  similar 
methods  were  also  carried  out  in  papers  [9],  [  15].  It  can 
be  concluded  that  optimal  strategies  which  ensure  favor¬ 
able  convergence  characteristics  will  lead  to  excessive 
memory  requirements.  No  3-D  calculation  with  New- 
ton-Krylov  subspace  techniques  were  presented  at  the 
conference. 

The  paper  by  Cambier  et  al.  [26]  proposed  a  new  im¬ 
plicit  algorithm  called  DDLU  factorization.  Compared 
to  the  classical  ADI  factorization,  this  strategy  enables  a 
reduction  in  both  CPU  time  and  memory.  The  new  im¬ 
plicit  technique  was  applied  to  a  3-D  supersonic  test 
case  on  a  relatively  coarse  mesh.  For  a  comprehensive 
assessment  of  this  technique  further  test  calculations  are 
required. 

The  paper  by  Merkle  et  al.  [  1 8]  was  devoted  to  conver¬ 
gence  acceleration  of  the  Navier-Stokes  equation 
through  a  time-derivative  preconditioning  of  the  gov¬ 
erning  equations.  Using  physical  arguments,  a  general¬ 
ized  preconditioner  was  developed,  ensuring  conver¬ 
gence  characteristics  which  are  independent  of  the 
Mach  number.  The  uniform  convergence  was  demon¬ 
strated  for  a  variety  of  applications  covering  a  wide 
range  of  Mach  numbers.  In  many  low  speed  cases,  the 
preconditioned  system  showed  a  much  improved  con¬ 
vergence  rate  while  having  no  detrimental  effects  in  re¬ 
gimes  where  the  original  method  already  worked  effi¬ 
ciently.  So,  preconditioning  of  the  governing  equations 
may  offer  the  possibility  to  develop  an  efficient  unified 
flow  solver  for  the  whole  Mach  number  regime.  Further 
research  is  required  to  establish  this  approach. 

At  the  conference  none  of  the  papers  devoted  to  conver¬ 
gence  acceleration  addressed  the  key  problem  of  com¬ 
puting  realistic  Reynolds  number  flows.  These  flows  re¬ 
quire  computational  meshes  with  very  high  aspect  ratio 
or  irregular  cells  leading  to  very  stiff  discrete  equations. 
The  development  of  numerical  strategies  to  overcome 
the  stiffness  and  to  ensure  fast  convergence  in  these  flow 
situations  is  one  of  the  grand  challenges  in  algorithmic 
research. 

2.7  Turbulent  Flows,  LES/DNS 

The  key  problem  of  accurate  numerical  simulation  of 
complex  flows  is  the  description  of  transition  and  turbu¬ 
lence.  Currently,  in  all  industrial  relevant  calculations, 
the  Reynolds  averaged  Navier-Stokes  equations  are 
solved,  in  which  only  the  statistically  stationary  flow  is 


calculated  and  the  effects  of  turbulence  are  modelled  by 
a  so-called  turbulence  model.  However,  in  many  cases 
the  quality  of  the  solution  may  strongly  depend  on  the 
turbulence  model  used  in  the  calculation  and  at  best 
questionable  results  may  be  obtained  for  more  complex 
flow  phenomena  such  as  massive  flow  separation.  The 
rapid  increase  of  computer  resources  motivated  the  re¬ 
search  on  direct  numerical  simulation  (DNS)  or  large 
eddy  simulation  (LES)  of  turbulent  flows.  In  the  case  of 
DNS,  the  unsteady  Navier-Stokes  equations  are  solved 
directly.  No  turbulence  model  is  required  since  all  scales 
and  turbulence  motions  present  are  resolved  numeri¬ 
cally.  Due  to  excessive  computer  resources  required 
even  for  simple  geometries,  this  simulation  technique  is 
out  of  question  for  practical  applications.  However,  it 
provides  a  very  important  methodology  for  turbulence 
research.  In  contrast  to  DNS,  the  large  eddy  simulation 
of  turbulent  flows  resolves  only  the  large  scale  structure 
of  the  turbulence,  while  the  effects  of  smaller  eddies  are 
described  by  a  statistical  subgrid  model.  As  the  resolu¬ 
tion  of  the  fine  scale  turbulence  motion  is  not  required, 
far  fewer  grid  points  are  needed  making  LES  feasible 
for  practical  problems  at  relevant  Reynolds  numbers  in 
the  near  future.  On  the  other  hand,  in  order  to  ensure  im¬ 
proved  results  compared  to  the  solution  of  the  Reynolds 
averaged  Navier-Stokes  equations  (RANS),  besides  the 
establishment  of  a  suitable  subgrid  model,  accurate  res¬ 
olution  of  the  viscous  sub-layers  in  the  near  wall  regions 
is  needed.  This  substantially  increases  the  number  of 
grid  points  for  LES  compared  to  RANS  solvers.  Fur¬ 
thermore,  since  time  accurate  solutions  are  calculated  in 
the  framework  of  LES,  significant  further  development 
of  the  classical  CFD  methods  is  needed.  In  addition  to 
the  validation  of  a  subgrid  model,  more  sophisticated  al¬ 
gorithms  such  as  adaptive  grids,  higher  order  discretiza¬ 
tions,  efficient  unsteady  solvers  and  parallel  computing 
have  to  be  made  available  for  LES  before  this  technique 
can  be  used  as  a  tool  for  flow  simulations. 

Numerical  aspects  of  DNS  and  LES  were  addressed  by 
papers  [5], [20], [21]  and  [22].  As  already  mentioned 
above,  the  papers  [5]  and  [21]  were  devoted  to  the  ex¬ 
ploitation  of  parallel  computers,  whereas  paper  [20]  pre¬ 
sented  a  grid  adaption  method  specially  designed  for 
LES.  The  focus  of  Comte  et  al.  [22]  was  the  investiga¬ 
tion  of  subgrid  scale  models  within  a  classical  explicit 
finite  difference  method.  The  aim  of  the  presentation 
was  to  show  some  examples  of  what  can  be  achieved 
with  today's  supercomputers  and  standard  codes  using 
eddy-viscosity  models. 

In  summary,  from  the  papers  delivered  at  the  conference 
it  is  very  difficult  to  estimate  whether  a  large  eddy  simu- 
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lation  for  a  practical  problem,  such  as  a  clean  wing  at  a 
relevant  Reynolds  number,  will  become  feasible  in  near 
future.  In  order  to  reach  that  goal,  significant  research 
work  on  both  algorithms  and  subgrid  scale  models  is 
needed.  A  few  preliminary  approaches  for  algorithmic 
improvement  were  shown  at  the  conference. 

2.8  Chemically  Reacting  Flows 

The  effective  use  of  CFD  for  viscous  hypersonic  react¬ 
ing  flows  is  one  of  the  present  challenges.  In  the  past, 
substantial  effort  has  been  devoted  to  this  research  area 
and  key  requirements  for  efficient  solution  algorithms 
have  been  identified  [30].  These  are  sharp  capturing  of 
strong  shocks,  robustness  in  regions  of  strong  flow  ex¬ 
pansion,  high  resolution  of  viscous  regions,  efficient 
treatment  of  adverse  grid  and  flow  situations  in  the  case 
of  complex  3-D  geometries,  and  effective  integration  of 
stiff  equations  introduced  by  the  large  chemical  source 
terms. 

The  two  conference  papers  devoted  to  reacting  flows  ad¬ 
dressed  these  algorithmic  issues.  The  paper  by  Rade- 
spiel  et  al.  [30]  reviewed  recent  progress  made  with  flux 
vector  splitting  methods  to  ensure  high  resolution  and 
robustness  for  hypersonic  viscous  reacting  flow  simula¬ 
tions.  Two  promising  approaches  recently  published  in 
the  literature  were  discussed  and  compared.  Both 
schemes  use  scalar  dissipation  functions  and  their  con¬ 
ceptual  differences  appear  in  the  resolution  of  shock 
waves.  Implementation  details  and  recommendations 
for  their  effective  use  for  viscous  flows  were  given.  Fur¬ 
thermore,  the  capabilities  of  the  multigrid  method  based 
on  explicit  multistage  time-stepping  schemes  were  in¬ 
vestigated  for  reacting  flows.  A  number  of  modest  mod¬ 
ifications  of  the  standard  multigrid  method  successfully 
used  for  subsonic  and  transonic  flow  problems  were  re¬ 
ported  in  order  to  ensure  fast  convergence  for  high 
Mach  number  flows  with  strong  shocks.  The  stiffness  of 
the  equations  introduced  by  the  large  chemical  source 
terms  is  removed  by  a  point  implicit  treatment.  Various 
computations  for  different  complex  flow  problems  were 
presented.  They  impressively  demonstrate  that  with  the 
reported  algorithmic  improvements  converged  flow  so¬ 
lutions  for  reacting  flows  over  complex  3-D  configura¬ 
tions  are  now  feasible. 

The  paper  devoted  by  Coquel  et  al.  [31]  focused  on  the 
extension  of  a  hybrid  upwind  spitting  method  to  non¬ 
equilibrium  flows.  Based  on  the  experience  that  the  clas¬ 
sical  Van  Leer  flux  vector  scheme  is  not  suitable  for  vis¬ 
cous  calculations  and  the  Roe  type  flux  difference 
solvers  are  not  robust  for  hypersonic  flows,  a  new  up¬ 
wind  approach  was  presented  which  basically  combines 


the  distinct  flux  vector  and  flux  difference  splitting  con¬ 
cepts  while  retaining  their  interesting  features.  The  pro¬ 
posed  method  is  a  combination  of  the  Van  Leer  scheme 
and  the  Osher  scheme  with  some  modifications  and  ex¬ 
tensions.  The  ability  of  the  new  method  to  resolve  vis¬ 
cous  hypersonic  reacting  flows  was  illustrated  by  vari¬ 
ous  results  including  internal  and  external  flow 
configurations.  The  time  integration  is  performed  by  an 
unfactored  implicit  scheme,  which  in  the  current  imple¬ 
mentation  leads  to  somewhat  slow  convergence  rate  and 
needs  to  be  improved  for  further  applications. 

In  conclusion  the  two  papers  on  reacting  flows  covered 
the  key  issues  for  developing  efficient  numerical  tools 
for  the  simulation  of  complex  flows.  Very  promising  re¬ 
sults  were  presented,  illustrating  that  effective  predic¬ 
tions  in  terms  of  both  accuracy  and  efficiency  for  com¬ 
plex  configurations  are  now  feasible. 

2.9  Unsteady  Flows 

For  steady  flows,  substantial  CFD  capability  has  been 
achieved  over  the  last  two  decades  and  Euler/Navier- 
Stokes  solvers  are  intensively  used  in  aerodynamic  de¬ 
sign.  In  contrast,  although  some  isolated  unsteady  flow 
calculations  have  been  carried  out  for  various  classes  of 
problems,  numerical  simulation  of  unsteady  flow  fields 
based  on  Euler/Navier-Stokes  equations  is  certainly  not 
routine  for  industrial  applications,  due  to  the  excessive 
computational  effort  involved  in  these  calculations. 
From  the  algorithmic  point  of  view,  new  innovative  con¬ 
cepts  are  required,  which  substantially  cut  down  the 
costs  of  time  accurate  simulations.  This  is  especially  im¬ 
portant  for  viscous  flow  calculations,  where  a  very  fine 
mesh  near  the  wall  is  required  to  resolve  the  boundary 
layer.  Issues  that  are  central  to  unsteady  CFD  are  the  use 
of  efficient  implicit  time  integration  with  favorable  sta¬ 
bility  and  accuracy  characteristics,  moving  grids,  adap¬ 
tive  grids  with  local  grid  refinement/coarsening  and  par¬ 
allel  computing.  Moreover,  for  aeroelastic  applications 
efficient  coupling  strategies  are  required. 

Time  accurate  calculations  have  been  addressed  by  sev¬ 
eral  papers  (e.g.  [27],  [32],  [33],  [34],  [35]).  The  paper 
by  Pentaris  et  al.  [32]  focused  on  the  solution  of  the  un¬ 
steady  incompressible  2-D  Navier-Stokes  equations  us¬ 
ing  a  projection  methodology  developed  for  collocated 
grids.  Standard  numerical  schemes,  such  as  approximate 
factorization  techniques,  were  employed.  The  numerical 
results  presented  for  some  test  cases  were  encouraging, 
however,  no  remarks  on  the  efficiency  of  the  method 
were  given. 
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The  paper  delivered  by  Allen  [33]  was  devoted  to  grid 
adaption  for  unsteady  inviscid  airfoil  flows.  The  solu¬ 
tion  adaptive  grids  are  generated  by  a  new  transfinite  in¬ 
terpolation  technique.  An  interesting  approach  was  pre¬ 
sented,  in  which  adaption  is  performed  by  adapting  the 
interpolation  parameters  instead  of  the  physical  grid  po¬ 
sitions.  For  unsteady  calculations,  grid  adaption  is  per¬ 
formed  gradually  by  imposing  a  so-called  adaption  ve¬ 
locity  onto  each  grid  point.  The  grid  interpolation 
strategy  was  shown  to  be  well  suited  for  structured  mov¬ 
ing  grids.  It  is  very  flexible  and  requires  only  little  CPU 
time.  Steady  and  unsteady  airfoil  computations  were 
presented  illustrating  the  improved  results  from  the 
adapted  meshes.  For  the  calculation,  an  upwind  Euler 
solver  with  the  dual-time  implicit  approach  was  used, 
which  is  considerably  more  efficient  than  the  basic  ex¬ 
plicit  solver.  The  paper  focused  on  two-dimensional  in¬ 
viscid  applications,  so  that  the  flexibility  and  efficiency 
of  the  proposed  grid  adaption  strategy  are  still  to  be  ver¬ 
ified  for  both  viscous  and  three-dimensional  flows.  A 
time-varying  grid  technique  was  also  presented  by  [26]. 
Here,  the  time  integration  was  carried  out  with  a  second 
order  implicit  scheme. 

A  more  sophisticated  moving  grid  technique  was  pre¬ 
sented  by  Jones  et  al.  [27],  with  the  goal  of  computing 
aircraft  store  trajectories.  The  technique  is  based  on 
fully  unstructured  or  hybrid  meshes.  It  was  pointed  out 
that  the  geometric  conservation  law  has  to  be  satisfied 
within  the  framework  of  moving  grids  in  order  to  guar¬ 
antee  consistent  results.  So  far,  only  two-dimensional 
unsteady  results  have  been  achieved. 

The  paper  by  Ruis  Calavera  et  al.  [35]  addressed  para¬ 
metric  studies  of  a  time  accurate  Euler  code  for  oscillat¬ 
ing  wings.  A  rather  standard  central  scheme  with  artifi¬ 
cial  dissipation  and  explicit  multistage  time  stepping 
scheme  was  used.  Effects  of  grid  density  and  artificial 
viscosity  on  the  time  accurate  solutions  were  discussed 
showing  the  expected  behavior.  The  code  has  been  im¬ 
plemented  on  a  powerful  parallel  computer,  namely  the 
National  Wind  Tunnel  of  NAL  in  Japan.  It  was  demon¬ 
strated  that  parallel  eomputing  is  a  necessary  ingredient 
for  effective  three-dimensional  unsteady  flow  calcula¬ 
tions. 

In  summary  a  view  central  issues  for  unsteady  computa¬ 
tions  were  discussed  by  the  conference  papers.  How¬ 
ever,  no  major  progress  in  the  development  of  algo¬ 
rithms  for  efficient  three-dimensional  time  accurate 
calculations  was  presented. 


3.  CONCLUDING  REMARKS 

In  chapter  2  each  specific  subject  of  the  meeting  has  al¬ 
ready  been  fully  commented,  so  that  only  general  con¬ 
cluding  remarks  are  given  here. 

In  the  evaluator‘s  opinion  the  theme  of  the  symposium 
’’Progress  and  Challenges  in  CFD  Algorithms  and 
Methods"  was  too  encompassing  and  too  ambitious  for 
a  3  1/2  day  long  AGARD  conference.  Many  papers  of 
great  interest  and  high  technical  standard  were  deliv¬ 
ered.  They  addressed  specific  challenges  in  CFD,  pro¬ 
posed  new  methods  or  modifications  to  known  method¬ 
ologies  and  presented  smaller  or  larger  progress.  On  the 
other  hand,  however,  quite  a  large  number  of  papers  of 
lower  quality  were  presented,  which  either  did  not  focus 
on  current  key  issues  of  algorithmic  research  or  mainly 
reinvented  well  known  results.  Probably  this  situation  is 
very  similar  to  all  other  large  CFD  conferences.  But 
measured  against  the  ambitious  theme  of  this  sympo¬ 
sium,  it  has  to  be  clearly  stated  that  in  many  areas  the 
Seville  conference  did  not  reflect  the  actual  status  of 
CFD  and  its  recent  progress.  Considering  Jameson's  ex¬ 
cellent  survey  paper,  it  is  obvious  that  several  important 
algorithmic  developments  and  recent  improvements 
were  not  addressed.  For  example,  no  paper  was  devoted 
to  aerodynamic  shape  optimization  and  multidisci¬ 
plinary  analysis,  topics  which  are  increasingly  important 
for  future  CFD  applications  in  industry.  Furthermore,  in 
some  areas  such  as  unstructured  grids  and  adaptive 
schemes,  CFD  is  much  further  developed  than  reported 
at  the  conference.  Since  many  leading  experts,  espe¬ 
cially  those  from  the  U.S.,  did  not  contribute  to  the  con¬ 
ference,  it  is  hard  to  expect  that  the  high  demands  of  the 
symposium  could  be  met. 

Nevertheless,  several  important  directions  of  algorith¬ 
mic  research  were  addressed,  which  are  expected  to  im¬ 
prove  the  capability  of  CFD  for  complex  applications  in 
the  industrial  environment.  These  included  parallel 
computing,  advanced  discretization  techniques,  fast  iter¬ 
ative  solvers  and  powerful  acceleration  techniques, 
adaptive  schemes  and  flexible  strategies  for  discretizing 
the  computational  domain.  Interesting  and  new  aspects 
of  these  techniques  were  discussed,  substantiating  their 
extended  potentials  and  improved  abilities.  In  most 
cases,  however,  the  superiority  of  the  more  sophisticated 
methods  to  the  well  established  standard  schemes  was 
only  demonstrated  for  simplified  test  problems,  for 
which  the  classical  methods  also  perform  quite  well. 
Very  often  results  were  shown  for  2-D  inviscid  and  lam¬ 
inar  viscous  flows.  Three-dimensional  calculations  were 
restricted  in  most  cases  to  inviscid  flows  or  simplified 
geometries.  Only  a  few  more  realistic  calculations  were 
presented.  To  make  a  step  forward,  it  is  very  important 
to  apply  the  advanced  methodologies  to  those  problems. 
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for  which  the  standard  methods  show  substantial  defi¬ 
ciencies  in  terms  of  accuracy  and  efficiency  or  do  not 
work  at  all.  One  of  the  grand  challenges  in  CFD  is  the 
effective  simulation  of  viscous  flows  at  realistic,  full 
scale  Reynolds  numbers  for  complex  configurations. 
This  problem,  although  ideal  for  testing  advanced  dis- 
cretiztion  and  time  integration  schemes,  was  hardly 
tackled  at  the  conference  even  for  simplified  geometries. 
Furthermore,  in  order  to  raise  the  confidence  level  of 
CFD  methods,  careful  grid  refinement  studies,  sensitiv¬ 
ity  investigations,  estimation  and  control  of  the  numeri¬ 
cal  error  as  well  as  detailed  code  validation  are  required 
for  a  wide  class  of  relevant  applications.  In  many  pa¬ 
pers,  these  issues  were  only  partly  or  not  all  considered. 

In  conclusion,  considerable  research  work  is  still  needed 
to  establish  CFD  as  an  effective  tool  in  the  aerodynamic 
design  process.  The  most  important,  but  probably  also 
the  most  limiting  factor,  is  turbulence  modelling,  a  sub¬ 
ject  which  was  outside  the  scope  of  this  symposium. 
With  respect  to  algorithms,  further  development  and  im¬ 
provement  remain  essential  but  have  to  be  directed  to¬ 
wards  the  real  challenges  in  CFD,  which  include: 

•  accurate  viscous  flow  simulation  at  relevant 
Reynolds  numbers 

•  effective  treatment  of  complex  configura¬ 
tions,  such  as  a  complete  aircraft 

•  efficient  simulation  of  more  complex  flows 
with  multiple  space  and  time  scales,  such  as 
unsteady  flows  or  reacting  flows 

•  large  eddy  simulation  for  practical  applica¬ 
tions 

•  aerodynamic  shape  optimization 

•  multidisciplinary  analysis  and  design 

The  Seville  symposium  was  a  step  in  the  right  direction. 
For  some  topics,  it  showed  some  good  promise  but  there 
is  still  considerable  work  to  be  done  to  meet  the  chal¬ 
lenges  of  industrial  CFD.  The  symposium  provided  a 
valuable  forum  for  exchange  of  information  about  re¬ 
cent  developments  and  achievements. 
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1.  SUMMARY 

This  paper  presents  a  perspective  on  conmutational  fluid 
dynamics  as  a  tool  for  aircraft  design.  It  addresses  the 
requirements  for  effective  industrial  use,  and  trade-offs 
between  modelling  accuracy  and  computational  costs.  Is¬ 
sues  in  algorithm  design  are  discussed  in  detail,  together 
with  a  unified  approacn  to  the  design  of  shock  capturing 
algorithms.  Finally,  the  paper  discusses  the  use  of  tech¬ 
niques  drawn  from  control  theory  to  determine  optimal 
aerodynamic  shapes.  In  the  future  multidisciplinary  anal¬ 
ysis  and  optimization  should  be  combined  to  provide  an 
integrated  numerical  design  environment. 


2.  INTRODUCTION 

Computational  methods  first  began  to  have  a  significant 
impact  on  aerodynamic  analysis  and  design  in  the  period 
of  1965-75.  This  decade  saw  the  introduction  of  panel 
methods  which  could  solve  ±e  linear  flovv  models  for 
arbitrarily  complex  geometry  in  both  subsonic  and  super¬ 
sonic  flow  [58,  147,  179].  It  also  saw  the  appearance  of 
the  first  satisfactory  methods  for  treating  the  nonlinear 
equations  of  transonic  flow  [123, 122, 63, 64, 43, 54],  and 
the  development  of  the  hodograph  method  for  the  design 
of  shock  free  supercritical  airfoils  [15]. 

Computational  Fluid  Dynamics  (CFD)  has  now  matured 
to  the  point  at  which  it  is  widely  accepted  as  a  key  tool 
for  aerodynamic  design.  Algorithms  have  been  the  sub¬ 
ject  of  intensive  development  for  the  past  two  decades. 
The  principles  underlying  the  design  and  implementation 
of  robust  schemes  which  can  accurately  resolve  shock 
waves  and  contact  discontinuities  in  compressible  flows 
are  now  quite  well  established.  It  is  also  quite  well  under¬ 
stood  how  to  design  high  order  schemes  for  viscous  flow, 
including  compact  schemes  and  spectral  methods.  Adap¬ 
tive  refinement  of  the  mesh  interval  (h)  and  the  order  of 
approximations  (p)  has  been  successfully  exploited  both 
separately  and  in  combination  in  the  h-p  method  [126]. 
A  continuing  obstacle  to  the  treatment  of  configurations 
with  complex  geometry  has  been  the  problem  of  mesh 
generation.  Several  general  techniques  have  been  devel¬ 
oped,  including  alg^raic  transformations  and  methods 
based  on  Ae  solution  of  elliptic  and  hyperbolic  equations. 
In  the  last  few  years  methods  using  unstructured  meshes 
have  also  b^un  to  gain  more  general  acceptance.  The 
Dassault-INMA  group  led  the  way  in  developing  a  fi¬ 
nite  element  method  for  transonic  potential  flow.  They 
obtained  a  solution  for  a  complete  Falcon  50  as  early 
as  1982  [25].  Euler  methods  for  unstructured  meshes 
have  been  the  subject  of  intensive  development  by  several 
groups  since  1985  [110,  82,  81,  163,  14],  and  Navier- 
Stokes  methods  on  unstructured  meshes  have  also  been 
demonstrated  [117,  118,  11]. 


Despite  the  advances  that  have  been  made,  CFD  is  still 
not  being  exploited  as  effectively  as  one  would  like  in  the 
design  process.  This  is  partly  due  to  the  long  set-up  and 
high  costs,  both  human  and  computational  of  complex 
flow  simulations.  The  essential  requirements  for  indus¬ 
trial  use  are: 


1.  assured  accuracy 

2.  acceptable  computational  and  human  costs 

3.  fast  turn  around. 


Improvements  are  still  needed  in  all  three  areas.  In  par¬ 
ticular,  the  fidelity  of  modelling  of  high  Reynolds  number 
viscous  flows  continues  to  be  limited  by  computational 
costs.  Consequently  accurate  and  cost-effective  simula¬ 
tion  of  viscous  flow  at  Reynolds  numbers  associated  with 
full  scale  flight,  such  as  the  prediction  of  high  lift  devices, 
remains  a  challenge.  Several  routes  are  available  toward 
the  reduction  of  computational  costs,  including  the  re¬ 
duction  of  mesh  requirements  by  the  use  of  higher  order 
schemes,  improved  convergence  to  a  steady  state  by  so¬ 
phisticated  acceleration  methods,  fast  inversion  methods 
for  implicit  schemes,  and  the  exploitation  of  massively 
parallel  computers. 

Another  factor  limiting  the  effective  use  of  CFD  is  the 
lack  of  good  interfaces  to  computer  aided  design  (CAD) 
systems.  The  geometry  models  provided  by  existing  CAD 
systems  often  fail  to  meet  the  requirements  of  continuity 
and  smoothness  needed  for  flow  simulation,  with  the  con¬ 
sequence  that  they  must  be  modified  before  they  can  be 
used  to  provide  the  input  for  mesh  generation.  This  bottle¬ 
neck,  which  impedes  the  automation  of  the  mesh  genera¬ 
tion  process,  needs  to  be  eliminated,  and  the  CFD  software 
should  he  fully  integrated  in  a  numerical  design  environ¬ 
ment.  In  addition  to  more  accurate  and  cost-effective  flow 
prediction  methods,  better  optimizations  methods  are  also 
needed,  so  that  not  only  can  designs  be  rapidly  evaluated, 
but  directions  of  improvement  can  be  identified.  Posses¬ 
sion  of  techniques  which  result  in  a  faster  design  cycle 
gives  a  crucial  advantage  in  a  competitive  environment. 

A  critical  issue,  examined  in  the  next  section,  is  the  choice 
of  mathematical  models.  What  level  of  complexity  is 
needed  to  provide  sufficient  accuracy  for  aerodynamic 
design,  and  what  is  the  impact  on  cost  and  turn-around 
time?  Section  3  addresses  the  design  of  numerical  algo¬ 
rithms  for  flow  simulation.  Section  4  presents  the  results 
of  some  numerical  calculations  which  require  moderate 
computer  resources  and  could  be  completed  with  the  fast 
turn-around  required  by  industrial  users.  Section  5  dis¬ 
cusses  automatic  design  procedures  which  can  be  used 
to  produce  optimum  aerodynamic  designs.  Finally,  Sec¬ 
tion  7.  offers  an  outlook  for  the  future. 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
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3.  THE  COMPLEXITY  OF  FLUID  FLOW  AND 
MATHEMATICAL  MODELLING 

3.1  The  Hierarchy  of  Mathematical  Models 

Many  critical  phenomena  of  fluid  flow,  such  as  shock 
waves  and  turbulence,  are  essentially  non-linear.  They 
also  exhibit  extreme  disparities  of  scales.  While  the  ac¬ 
tual  thickness  of  a  shock  wave  is  of  the  order  of  a  mean 
free  path  of  the  gas  particles,  on  a  macroscopic  scale  its 
thickness  is  essentially  zero.  In  turbulent  flow  energy 
is  transferred  from  large  scale  motions  to  progressively 
smaller  eddies  until  the  scale  becomes  so  small  that  the 
motion  is  dissipated  by  viscosity.  The  ratio  of  the  length 
scale  of  the  global  flow  to  that  of  the  smallest  persisting 

eddies  is  of  the  order  Re^,  where  Re  is  the  Reynolds  num¬ 
ber,  typically  in  the  range  of  30  million  for  an  mrcraft.  In 
order  to  resolve  such  scales  in  all  three  space  directions  a 
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computational  grid  with  the  order  of  Re^  cells  would  be 
required.  This  is  beyond  the  range  of  any  current  or  fore¬ 
seeable  computer.  Consequently  mathematical  models 
with  v^ing  degrees  of  simplification  have  to  be  intro¬ 
duced  in  order  to  make  computational  simulation  of  flow 
feasible,  and  to  produce  viaole  and  cost-effective  meth¬ 
ods. 

Figure  1  (supplied  by  Pradeep  R^)  indicates  a  hierar¬ 
chy  of  models  at  different  levels  of  simplification  which 
have  proved  useful  in  practice.  Efficient  flight  is  gen¬ 
erally  achieved  by  the  use  of  smooth  and  streamlined 
shapes  which  avoid  flow  separation  and  minimize  vis¬ 
cous  effects,  with  the  consequence  that  useful  predictions 
can  be  made  using  inviscid  models.  Inviscid  calculations 
with  bound^  layer  corrections  can  provide  quite  accu¬ 
rate  predictions  of  lift  and  drag  when  the  flow  remains 
attacned,  but  iteration  between  the  inviscid  outer  solution 
and  the  inner  boundary  layer  solution  becomes  increas¬ 
ingly  difficult  with  the  onset  of  separation.  Procedures  for 
solving  the  full  viscous  equations  are  likely  to  be  needed 
for  the  simulation  of  arbitrary  complex  separated  flows, 
which  may  occur  at  high  angles  of  attack  or  with  bluff 
bodies.  In  order  to  treat  flows  at  high  Reynolds  numbers, 
one  is  generally  forced  to  estimate  turbulent  effects  by 
Reynolds  averaging  of  the  fluctuating  components.  This 
requires  the  introduction  of  a  turbulence  model.  As  the 
available  computing  power  increases  one  may  also  as¬ 
pire  to  large  eddy  simulation  (LES)  in  which  the  larger 
scale  eddies  are  directly  calculated,  while  the  influence 
of  turbulence  at  scales  smaller  than  the  mesh  interval  is 
represented  by  a  subgrid  scale  model. 


potential  flow  or  Euler  solutions  for  an  airfoil  can  be  ac¬ 
curately  calculated  on  a  mesh  with  160  cells  around  the 
section,  and  32  cells  normal  to  the  section.  Using  multi¬ 
grid  techniques  10  to  25  cycles  are  enough  to  obtain  a 
converged  result.  Consequently  airfoil  cmculations  can 
be  performed  in  seconds  on  a  Cray  YMP,  and  can  also 
be  performed  on  486-class  personal  computers.  Corre¬ 
spondingly  accurate  three-dimensional  inviscid  calcula¬ 
tions  can  be  performed  for  a  wing  on  a  mesh,  say  with 
192x32x48=294, 912  cells,  in  about  5  minutes  on  a  sin¬ 
gle  processor  Cray  YMP,  or  less  than  a  minute  with  eight 
processors,  or  in  1  or  2  hours  on  a  workstation  such  as  a 
Hewlett  Packard  735  or  an  IBM  560  model. 

Viscous  simulations  at  high  Reynolds  numbers  require 
vastly  greater  resources.  Careful  two-dimensional  studies 
of  mesn  requirements  have  been  carried  out  at  Princeton 
by  Martinelli  [114].  He  found  that  on  the  order  of  32 
mesh  intervals  were  needed  to  resolve  a  turbulent  bound¬ 
ary  layer,  in  addition  to  32  intervals  between  the  boundary 
layer  and  the  far  field,  leading  to  a  total  of  64  intervals. 
In  order  to  prevent  degradations  in  accuracy  and  conver- 
ence  due  to  excessively  large  aspect  ratios  (in  excess  of 
,000)  in  the  surface  mesh  cells,  the  chordwise  resolu¬ 
tion  must  also  be  increased  to  512  intervals.  Reasonably 
accurate  solutions  can  be  obtained  in  a  512x64  mesh  in 
100  multigrid  cycles.  Translated  to  three  dimensions,  this 
would  imply  the  need  for  meshes  with  5-10  million  cells 
(for  example,  512x64x256  =  8,388,608  cells  as  shown 
in  Figure  2).  When  simulations  are  performed  on  less 
fine  meshes  with,  say,  500,000  to  1  million  cells,  it  is  very 
hard  to  avoid  mesh  dependency  in  the  solutions  as  well  as 
sensitivity  to  the  turbulence  model. 


Figure  2:  Mesh  Requirements  for  a  Viscous  Simulation 


Figure  1:  Hierarchy  of  Fluid  Flow  Models 


3.2  Computational  Costs 

Computational  costs  vary  drastically  with  the  choice  of 
mathematical  model.  Panel  methods  can  be  effectively 
used  to  solve  the  linear  potential  flow  equation  with 
higher-end  personal  computers  (with  an  Intel  80486  mi¬ 
croprocessor,  for  example).  Studies  of  the  dependency 
of  the  result  on  mesh  refinement,  performed  by  this  au- 
fiior  and  others,  have  demonstrated  that  inviscid  transonic 


A  typical  algorithm  requires  of  the  order  of  5,000  floating 
point  operations  per  mesh  point  in  one  multigrid  iteration. 
With  10  million  mesh  points,  the  operation  count  is  of  the 
order  of  0.5x  10**  per  cycle.  Given  a  computer  capable 
of  sustaining  10**  operations  per  second  (100  gigaflops), 
200  cycles  could  then  be  performed  in  100  seconds.  Sim¬ 
ulations  of  unsteady  viscous  flows  (flutter,  buffet)  would 
be  likely  to  require  1,000-10,000  time  steps.  A  further 
progression  to  large  eddy  simulation  of  complex  config¬ 
urations  would  require  even  greater  resources.  The  fol¬ 
lowing  estimate  is  due  to  W.H.  Jou  [90].  Suppose  that  a 
conservative  estimate  of  the  size  of  eddies  in  a  boundary 
layer  that  ought  to  be  resolved  is  1  /5  of  the  boundary  layer 
thickness.  Assuming  that  10  points  are  needed  to  resolve 
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a  single  eddy,  the  mesh  interval  should  then  be  1/50  of 
the  bound^  layer  thickness.  Moreover,  since  the  eddies 
are  three-dimensional,  the  same  mesh  interval  should  be 
used  in  all  three  directions.  Now,  if  the  boundary  layer 
thickness  is  of  the  order  of  0.0 1  of  the  chord  length,  5,000 
intervals  will  be  needed  in  the  chordwise  direction,  and 
for  a  wing  with  an  aspect  ratio  of  10, 50,000  intervals  will 
be  needed  in  the  span  wise  direction.  Thus,  of  the  order  of 
50  X  5, 000  X  50, 000  or  12.5  billion  mesh  points  would 
be  needed  in  the  boundary  layer.  If  the  time  dependent 
behavior  of  the  eddies  is  to  be  fully  resolved  using  time 
steps  on  the  order  of  the  time  for  a  wave  to  pass  through  a 
mesh  interval,  and  one  allows  for  a  total  time  eq^ual  to  the 
time  required  for  waves  to  travel  three  times  the  length 
of  the  chord,  of  the  order  of  15,000  time  steps  would  be 
needed.  Performance  beyond  the  teraflop  (10’^  opera¬ 
tions  per  second)  will  be  needed  to  attempt  calculations 
of  this  nature,  which  also  have  an  information  content  far 
beyond  what  is  needed  for  enginering  analysis  and  de¬ 
sign.  The  designer  does  not  need  to  know  the  details  of 
the  eddies  in  the  bounda^  layer.  The  prim^  purpose 
of  such  calculations  is  to  improve  the  prediction  of  aver¬ 
aged  quantities  such  as  skin  iriction,  and  the  prediction  of 
global  behavior  such  as  the  onset  of  separation.  The  main 
current  use  of  Navier-Stokes  and  large  eddy  simulations 
is  to  gain  an  improved  insight  into  the  physics  of  turbulent 
flow,  which  may  in  turn  lead  to  the  development  of  more 
comprehensive  and  reliable  turbulence  models. 


3.3  llirbulence  Modelling 

It  is  doubtful  whether  a  universally  valid  turbulence 
model,  capable  of  describing  all  complex  flows,  could  be 
devised  [52].  Algebraic  models  [30, 9]  have  proved  fairly 
satisfactory  for  the  calculation  of  attached  and  slightly 
separated  wing  flows.  These  models  rely  on  the  boundary 
layer  concept,  usually  incorporating  separate  formulas  for 
the  inner  and  outer  layers,  and  they  require  an  estimate 
of  a  length  scale  which  depends  on  the  thickness  of  the 
boundary  layer.  The  estimation  of  this  quantity  by  a 
search  for  a  maximum  of  the  vorticity  times  a  distance 
to  the  wall,  as  in  the  Baldwin-Lomax  model,  can  lead  to 
ambiguities  in  internal  flows,  and  also  in  complex  vorti¬ 
cal  flows  over  slender  bodies  and  highly  swept  or  delta 
wings  [40,  115].  The  Johnson-King  model  [88],  which 
allows  for  non-equilibrium  effects  through  the  introduc¬ 
tion  of  an  ordinary  differential  equation  for  the  maximum 
shear  stress,  has  improved  the  prediction  of  flows  with 
shock  induced  separation  [148, 91]. 

Closure  models  depending  on  the  solution  of  transport 
Muations  are  widely  accepted  for  industrial  applications. 
These  models  eliminate  the  need  to  estimate  a  length  scale 
by  detecting  the  edge  of  the  boundary  layer.  Eddy  viscos¬ 
ity  models  typical^  use  two  equations  for  the  turbulent 
kinetic  energy  k  and  the  dissipation  rate  e,  or  a  pair  of 
equivalent  quantities  [89,  178,  160,  1,  121,  35].  Models 
of  this  type  generally  tend  to  present  difficulties  in  the 
region  very  close  to  the  wall.  They  also  tend  to  be  badly 
conditioned  for  numerical  solution.  The  k  - 1  model  [154] 
is  designed  to  alleviate  this  problem  by  taking  advantage 
of  the  linear  behaviour  of  the  length  scale  I  near  the  wall. 
In  an  alternative  approach  to  the  design  of  models  which 
are  more  amenable  to  numerical  solution,  new  models 
requiring  the  solution  of  one  transport  equation  have  re¬ 
cently  been  introduced  [10,  159].  The  performance  of 
the  algebraic  models  remains  competitive  for  wing  flows, 
but  the  one-  and  two-equation  models  show  promise  for 
broader  classes  of  flows.  In  order  to  achieve  greater  uni¬ 
versality,  research  is  also  being  pursued  on  more  complex 
Reynolds  stress  transport  models,  which  require  the  solu¬ 
tion  of  a  larger  number  of  transport  equations. 

Another  direction  of  research  is  the  attempt  to  devise 
more  rational  models  via  renormalization  group  (RNG) 
theory  [182, 155].  Both  algebraic  and  two-equation  k-e 
models  devised  by  this  approach  have  shown  promising 
results  [116]. 


The  selection  of  sufficiently  accurate  mathematical  mod¬ 
els  and  a  judgment  of  their  cost-effectiveness  ultimately 
rests  with  industry.  Aircraft  and  spacecraft  designs  nor¬ 
mally  pass  through  the  three  phases  of  conceptual  design, 
preliminary  design,  and  detailed  design.  Correspond¬ 
ingly,  the  appropriate  CFD  models  will  vary  in  complex¬ 
ity.  In  the  conceptual  and  preliminary  design  phases,  the 
emphasis  will  be  on  relatively  simple  mo^ls  which  can 
give  results  with  very  rapid  turn-around  and  low  computer 
costs,  in  order  to  evaluate  alternative  configurations  and 
perform  quick  parametric  studies.  The  detailed  design 
stage  requires  the  most  complete  simulation  that  can  be 
achieved  with  acceptable  cost.  In  the  past,  the  low  level 
of  confidence  that  could  be  placed  on  numerical  predic¬ 
tions  has  forced  the  extensive  use  of  wind  tunnel  testing 
at  an  early  stage  of  the  design.  This  practice  was  very 
expensive.  The  limited  number  of  models  that  could  be 
fabricated  also  limited  the  range  of  design  variations  that 
could  be  evaluated.  It  can  be  anticipated  that  in  the  ffi- 
ture,  the  role  of  wind  tunnel  testing  in  the  design  process 
will  be  more  one  of  verification.  Experimental  research 
to  improve  our  understanding  of  the  physics  of  complex 
flows  will  continue,  however,  to  play  a  vital  role. 


4.  CFD  ALGORITHMS 

4.1  Difficulties  of  Flow  Simulation 

The  computational  simulation  of  fluid  flow  presents  a 
number  of  severe  challenges  for  algorithm  design.  At  the 
level  of  inviscid  modeling,  the  inherent  nonlinearity  of 
the  fluid  flow  equations  leads  to  the  formation  of  singu¬ 
larities  such  as  shock  waves  and  contact  discontinuities. 
Moreover,  the  geometric  configurations  of  interest  are 
extremely  complex,  and  generally  contain  sham  edges 
which  lead  to  the  shedding  of  vortex  sheets.  Extreme 

elients  near  stagnation  points  or  wing  tips  may  also 
to  numerical  errors  that  can  have  ^obal  influence. 
Numerically  generated  entropy  may  be  convected  from 
the  leading  edge,  for  example,  causing  the  formation  of 
a  numerically  induced  boundary  layer  which  can  lead  to 
separation.  The  need  to  treat  exterior  domains  of  infinite 
extent  is  also  a  source  of  difficulty.  Boundary  conditions 
imposed  at  artificial  outer  boundaries  may  cause  reflected 
waves  which  significantly  interfere  with  the  flow.  \^en 
viscous  effects  are  also  included  in  the  simulation,  the 
extreme  difference  of  the  scales  in  the  viscous  boundaiy 
layer  and  the  outer  flow,  which  is  essentially  inviscid,  is 
another  source  of  difficulty,  forcing  the  use  or  meshes  with 
extreme  variations  in  the  mesh  intervals.  For  these  rea¬ 
sons,  CFD  has  been  a  driving  force  for  the  development 
of  numerical  algorithms. 


4.2  Structured  and  Unstructured  Meshes 

The  algorithm  designer  faces  a  number  of  critical  deci¬ 
sions.  The  first  choice  that  must  be  made  is  the  nature 
of  the  mesh  used  to  divide  the  flow  field  into  discrete 
subdomains.  The  discretization  procedure  must  allow  for 
the  treatment  of  complex  configurations.  The  principal 
alternatives  are  Cartesian  meshes,  body-fitted  curvilinear 
meshes,  and  unstructured  tetrahedral  meshes.  Each  of 
these  ^proaches  has  advantages  which  have  led  to  their 
use.  The  Cartesian  mesh  minimizes  the  complexity  of 
the  algorithm  at  interior  points  and  facilitates  me  use  of 
high  order  discretization  procedures,  at  the  expense  of 
greater  complexity,  and  possibly  a  loss  of  accuracy,  in  the 
treatment  of  boundary  conditions  at  curved  surfaces.  This 
difficulty  may  be  alleviated  by  using  mesh  refinement  pro¬ 
cedures  near  the  surface.  With  their  aid,  schemes  which 
use  Cartesian  meshes  have  recently  been  developed  to 
treat  very  complex  configurations  [120,  149,  22,  94]. 

Body-fitted  meshes  have  been  widely  used  and  are  par¬ 
ticularly  well  suited  to  the  treatment  of  viscous  flow  be¬ 
cause  they  readily  allow  the  mesh  to  be  compressed  near 
the  body  surface.  With  this  approach,  the  problem  of 
mesh  generation  itself  has  proved  to  be  a  major  pacing 
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item.  The  most  commonly  used  procedures  are  alge¬ 
braic  transformations  [7,  44,  46,  156],  methods  based  on 
the  solution  of  elliptic  equations,  pioneered  by  Thompson 
[170, 171, 157, 158],  and  methods  based  on  the  solution  of 
hyperbolic  equations  marching  out  from  the  body  [161]. 
In  order  to  treat  very  complex  configurations  it  generally 
proves  expedient  to  use  a  multiblock  [177,  150]  proce¬ 
dure,  with  separately  generated  meshes  in  each  block, 
which  may  then  be  patched  at  block  faces,  or  allowed 
to  overlap,  as  in  the  Chimera  scheme  [19,  20].  While  a 
number  of  interactive  software  systems  for  grid  genera¬ 
tion  have  been  developed,  such  as  EAGLE,  GRIDGEN, 
and  ICEM,  the  generation  of  a  satisfactory  grid  for  a  very 
complex  configuration  may  require  months  of  effort. 

The  alternative  is  to  use  an  unstructured  mesh  in  which  the 
domain  is  subdivided  into  tetrahedra.  This  in  turn  requires 
the  development  of  solution  algorithms  capable  of  yield¬ 
ing  the  required  accuracy  on  unstructurea  meshes.  This 
approach  has  been  gaining  acceptance,  as  it  is  becoming 
apparent  that  it  can  lead  to  a  speed-up  and  reduction  in 
the  cost  of  mesh  generation  that  more  than  offsets  the  in¬ 
creased  complexity  and  cost  of  the  flow  simulations.  Two 
competing  procedures  for  generating  triangulations  which 
have  both  proved  successful  are  Delaunay  triangulation 
[41,  11],  based  on  concepts  introduced  at  the  beginning 
of  the  century  by  Voronoi  [175],  and  the  moving  front 
method  [111]. 


4.3  Finite  Difference,  Finite  Volume,  and  Finite  Ele¬ 
ment  Schemes 

Associated  with  choice  of  mesh  type  is  the  formulation  of 
the  discretization  procedure  for  the  equations  of  fluid  flow, 
which  can  be  expressed  as  differential  conservation  laws. 
In  the  Cartesian  tensor  notation,  let  Xj  be  the  coordinates, 
p,  p,  T,  and  E  the  pressure,  density,  temperature,  and 
total  energy,  and  Ui  the  velocity  components.  Using  the 
convention  that  summation  overy=lj  2, 3  is  implied  by  a 
repeated  subscript  j,  each  conservation  equation  has  the 
form 


dt  dx,. 


=0. 


(1) 


For  the  mass  equation 


w=p,  fj=puj. 

For  the  i  momentum  equation 


where  aij  is  the  viscous  stress  tensor.  For  the  energy 
equation 


'W=pE, 


fj=(pE  +  p)  Uj  -  (TjkUk  -  K, 


dT 

dxj  ’ 


where  k  is  the  coefficient  of  heat  conduction.  The  pressure 
is  related  to  the  density  and  energy  by  the  equation  of  state 

p=i'y  -  1)  p^E  -  ^UiU^  (2) 


in  which  7  is  the  ratio  of  specific  heats.  In  the  Navier- 
Stokes  equations  the  viscous  stresses  are  assumed  to  be 
linearly  proportional  to  the  rate  of  strain,  or 


where  p  and  A  are  the  coefficients  of  viscosity  and  bulk 
viscosity,  and  usually  X=—2p/3. 

The  finite  difference  method,  which  requires  the  use  of 
a  Cartesian  or  a  structured  curvilinear  mesh,  directly  ap¬ 
proximates  the  differential  operators  appearing  in  these 


equations.  In  the  finite  volume  method  [112],  the  dis¬ 
cretization  is  accomplished  by  dividing  the  domain  of 
the  flow  into  a  large  number  of  small  subdomains,  and 
applying  the  conservation  laws  in  the  integral  form 

4-  [  wdV  +  /  f  •  dS=0. 
dt  Ja  Jon 

Here  f  is  the  flux  appearing  in  equation  (1)  and  dS  is 
the  directed  surface  element  of  the  boundary  dD.  of  the 
domain  fl.  The  use  of  the  integral  form  has  the  advantage 
that  no  assumption  of  the  differentiability  of  the  solutions 
is  implied,  with  the  result  that  it  remains  a  valid  statement 
for  a  subdomain  containing  a  shock  wave.  In  general  the 
subdomains  could  be  arbitrary,  but  it  is  convenient  to  use 
either  hexahedral  cells  in  a  body  conforming  curvilinear 
mesh  or  tetrahedrons  in  an  unstructured  mesh. 

Alternative  discretization  schemes  may  be  obtained  by 
storing  flow  variables  at  either  the  cell  centers  or  the  ver¬ 
tices.  These  variations  are  illustrated  in  Figure  3  for  the 
two-dimensional  case.  With  a  cell-centered  scheme  the 
discrete  conservation  law  takes  the  form 


3a:  Cell  Centered  Scheme. 


3b:  Vertex  Scheme. 

Figure  3:  Structured  and  Unstructured  Discretizations. 


l»v.5:f.s=o, 

faces 


(4) 


where  V  is  the  cell  volume,  and  f  is  now  a  numerical 
estimate  of  the  flux  vector  through  each  face,  f  may  be 
evaluated  from  values  of  the  flow  variables  in  the  cells 
separated  by  each  face,  using  upwind  biasing  to  allow  for 
the  directions  of  wave  propagation.  With  hexahedral  cells, 
equation  (4)  is  very  similar  to  a  finite  difference  scheme 
in  curvilinear  coordinates.  Under  a  transformation  to 
curvilinear  coordinates  equation  (1)  becomes 


d^i 


(5) 


where  J  is  the  Jacobian  determinant  of  the  transformation 
matrix  .  The  transformed  flux  J^fj  corresponds 
to  the  dot  product  of  the  flux  f  with  a  vector  face  area 
J^,  while  J  represents  the  transformation  of  the  cell 

volume.  The  finite  volume  form  (4)  has  the  advantages 
that  it  is  valid  for  both  structured  and  unstructured  meshes, 
and  that  it  assures  that  a  uniform  flow  exactly  satisfles  the 
equations,  because  faces  ^  closed  control  vol¬ 

ume.  Finite  difference  schemes  do  not  necessarily  satisfy 
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this  constraint  because  of  the  discretization  errors  in  eval¬ 
uating  ^  and  the  inversion  of  the  transformation  matrix. 

A  cell-vertex  finite  volume  scheme  can  be  derived  by  tak¬ 
ing  the  union  of  the  cells  surrounding  a  given  vertex  as  the 
control  volume  for  that  vertex  [55,  71,  139].  In  equation 
(4),  V  is  now  the  sum  of  the  volumes  of  the  surrounding 
cells,  while  the  flux  balance  is  evaluated  over  the  outer 
faces  of  the  polyhedral  control  volume.  In  the  absence  of 
upwind  biasing  the  flux  vector  is  evaluated  by  averaging 
over  the  comers  of  each  face.  This  has  the  advantage  of 
remaining  accurate  on  an  irregular  or  unstmctured  mesh. 
An  alternative  route  to  the  discrete  equations  is  provided 
by  the  finite  element  method.  Whereas  the  finite  differ¬ 
ence  and  finite  volume  methods  approximate  the  differ¬ 
ential  and  integral  operators,  the  finite  element  method 
proceeds  by  inserting  an  approximate  solution  into  the 
exact  equations.  On  multiplying  by  a  test  function  <(>  and 
integrating  by  parts  over  space,  one  obtains  the  weak  form 


dt 


i  V(i>dQ.  - 


(6) 


which  is  also  valid  in  the  presence  of  discontinuities  in  the 
flow.  In  the  Galerkin  method  the  approximate  solution  is 
expanded  in  terms  of  the  same  family  of  functions  as  those 
from  which  the  test  functions  are  drawn.  By  choosing 
test  functions  with  local  support,  separate  equations  are 
obtained  for  each  node.  For  example,  if  a  tetrahedral 
mesh  is  used,  and  (j)  is  piecewise  linear,  with  a  nonzero 
value  only  at  a  single  node,  the  equations  at  each  node 
have  a  stencil  which  contains  only  the  nearest  neighbors. 
In  this  case  the  finite  element  approximation  corresponds 
closely  to  a  finite  volume  scheme.  If  a  piecewise  linear 
approximation  to  the  flux  f  is  used  in  the  evaluation  of 
the  integrals  on  the  right  hand  side  of  equation  (6),  these 
integrals  reduce  to  formulas  which  are  identical  to  the  flux 
balance  of  the  finite  volume  scheme. 


Thus  the  finite  difference  and  finite  volume  methods  lead 
to  essentially  similar  schemes  on  structured  meshes,  while 
the  finite  volume  method  is  essentially  equivalent  to  a  fi¬ 
nite  element  method  with  linear  elements  when  a  tetra¬ 
hedral  mesh  is  used.  Provided  that  the  flow  equations 
are  expressed  in  the  conservation  law  form  (1),  all  three 
methods  lead  to  an  exact  cancellation  of  the  fluxes  through 
interior  cell  boundaries,  so  that  the  conservative  property 
of  the  equations  is  preserved.  The  important  role  of  this 
property  in  ensuring  correct  shock  jump  conditions  was 
pointedfout  by  Lax  and  Wendroff  [97]. 


maximum  Vk  -  Vj  <  0,  and  at  a  minimum  Vk  -  Vj  >  0. 
Thus  the  condition 


Cjk  >0,  k  /^j  (8) 


is  sufficient  to  ensure  stability  in  the  maximum  norm. 
Moreover,  if  the  scheme  has  a  compact  stencil,  so  that 
Cjk=0  when  j  and  k  are  not  nearest  neighbors,  a  local 
maximum  cannot  increase  and  local  minimum  cannot  de¬ 
crease.  This  local  extremum  diminishing  (LED)  property 
prevents  the  birth  and  growth  of  oscillations.  The  one¬ 
dimensional  conservation  law 


du  9 


provides  a  useful  model  for  analysis.  In  this  case  waves 
are  propagated  with  a  speed  a(u)  =^,  and  the  solution 
is  constant  along  the  characteristics  ^=a(u).  Thus  the 
LED  property  is  satisfied.  In  fact  the  total  variation 


TViu)  =  I 


OO 

-oo 


du 

dx 


dx 


of  a  solution  of  this  equation  does  not  increase,  provided 
that  any  discontinuity  appearing  in  the  solution  satisfies  an 
entropy  condition  [96],  Harten  proposed  that  difference 
schemes  ought  to  be  designed  so  that  the  discrete  total 
variation  cannot  increase  [56].  If  the  end  values  are  fixed, 
the  total  variation  can  be  expressed  as 

TV  (u)  =2  maxima  —  ^  minima  ^  . 

Thus  a  LED  scheme  is  also  total  variation  diminish¬ 
ing  (TVD).  Positivity  conditions  of  the  type  expressed 
in  equations  (7)  and  (8)  lead  to  diagonally  dominant 
schemes,  and  are  the  key  to  the  elimination  of  improper 
oscillations.  The  positivity  conditions  may  be  realized  by 
the  introduction  of  diffusive  terms  or  by  the  use  of  up¬ 
wind  biasing  in  the  discrete  scheme.  Unfortunately,  they 
may  also  lead  to  severe  restrictions  on  accuracy  unless  the 
coefficients  have  a  complex  nonlinear  dependence  on  the 
solution. 


4.4.2  Artificial  Diffusion  and  Upwinding 


4.4  Non-oscUlatory  Shock  Capturing  Schemes 

4.4. 1  Local  Extremum  Diminishing  ( LED)  Schemes 


The  discretization  procedures  which  have  been  described 
in  the  last  section  lead  to  nondissipative  approximations 
to  the  Euler  equations.  Dissipative  terms  may  be  needed 
for  two  reasons.  The  first  is  the  possibility  of  undamped 
oscillatory  modes.  The  second  reason  is  the  need  for  the 
clean  capture  of  shock  waves  and  contact  discontinuities 
without  undesirable  oscillations.  An  extreme  overshoot 
could  result  in  a  negative  value  of  an  inherently  positive 
quantity  such  as  the  pressure  or  density.  The  next  sec¬ 
tions  summarize  a  unified  approach  to  the  construction  of 
nonoscillatory  schemes  via  the  introduction  of  controlled 
diffusive  and  antidiffusive  terms.  This  is  the  line  adhered 
to  in  the  author’s  own  work. 


The  development  of  non-oscillatoiy  schemes  has  been  a 

Erime  focus  of  algorithm  research  for  compressible  flow. 
!onsider  a  general  semi-discrete  scheme  of  the  form 


—Vj 

dt  ^ 


=Y^Cjkivk  -  Vj) .  (7) 

kfj 


Following  the  pioneering  work  of  Godunov  [5 1],  a  variety 
of  dissipative  and  upwind  schemes  designed  to  have  good 
shock  capturing  properties  have  been  developed  during 
the  past  two  decades  [162,  23,  98,  100,  146,  130,  56, 
129,  166,  5,  68,  183,  62,  180,  13,  12,  11].  If  the  one¬ 
dimensional  scalar  conservation  law 


dv 

dt 


is  represented  by  a  three  point  scheme 


(9) 


dvj 

dt 


{Vj+I  -  Vj)  +  C  1  (Uj-l  -  Vj)  , 
J  2 


the  scheme  is  LED  if 


cj<.j>0,  c-_.>0.  (10) 

A  conservative  semidiscrete  approximation  to  the  one¬ 
dimensional  conservation  law  can  be  derived  by  subdi¬ 
viding  the  line  into  cells.  Then  the  evolution  of  the  value 
Vj  in  the  jth  cell  is  given  by 


A  maximum  cannot  increase  and  a  minimum  cannot  de-  .  dvj 

crease  if  the  coefficients  Cjk  are  non-negative,  since  at  a  ~ 
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where  /i,  + 1  is  an  estimate  of  the  flux  between  cells  j  and 
j  +  1.  The  simplest  estimate  is  the  arithmetic  average 
(/j+i  /j)  /2>  but  this  leads  to  a  scheme  that  does  not 

satisfy  the  positivity  conditions.  To  correct  this,  one  may 
add  a  dissipative  term  and  set 


In  order  to  estimate  the  required  value  of  the  coefficient 
a,+i ,  let  aj+ 1  be  a  numerical  estimate  of  the  wave  speed 


du  ’ 


Then 


Vt 


if  Vj-k-\=Vj 


J  2 


M. 

dv 


=  + 


(13) 


I  2^j+\  ~  01  j+ 


Au^.+i 


where 


Avj+i=Vj+i  -  Vj, 


and  the  LED  condition  (10)  is  satisfied  if 


1 

j  i-i  ^  2 


®7+l 


(14) 


If  one  takes 

1  I  I 
“i+r2 

one  obtains  the  first  order  upwind  scheme 


fj 

fj+i 


ifCj+i  >  0 
ifUj+i  <  0 


This  is  the  least  diffusive  first  order  scheme  which  satisfies 
the  LED  condition.  In  this  sense  upwinding  is  a  natural 
approach  to  the  construction  of  non-oscillatory  schemes. 
It  may  be  noted  that  the  successful  treatment  of  transonic 
potential  flow  also  involved  the  use  of  imwind  biasing. 
This  was  first  introduced  by  Murman  and  Cole  to  treat  the 
transonic  small  disturbance  equation  [123]. 


Another  important  requirement  of  discrete  schemes  is 
that  they  should  exclude  nonphysical  solutions  which  do 
not  satisfy  appropriate  entropy  conditions  [95],  which 
require  the  convergence  of  characteristics  towards  ad¬ 
missible  discontinuities.  This  places  more  shingent 
bounds  on  the  minimum  level  of  numerical  viscosity 
[113,  169,  128,  131].  In  the  case  that  the  numerical  flux 
function  is  strictly  convex.  Also  has  recently  proved  [2] 
that  it  is  sufficient  that 

Qj+i  >  max  I  i  |a_,+  i  I , e  signCuj+i  -  vj) 


fore  >  0.  Thus  the  numerical  viscosity  should  be  rounded 
out  and  not  allowed  to  reach  zero  at  a  point  where  the 

wave  speed  a(u)  =|^  approaches  zero.  This  justifies,  for 
example.  Marten’s  entropy  fix  [56]. 

Higher  order  schemes  can  be  constructed  by  introducing 
higher  order  diffusive  terms.  Unfortunately  these  have 
larger  stencils  and  coefficients  of  varying  sign  which  are 
not  compatible  with  the  conditions  (8)  for  a  LED  scheme, 
and  it  is  known  that  schemes  which  satisfy  these  condi¬ 
tions  are  at  best  first  order  accurate  in  the  neighborhood 
of  an  extremum.  It  proves  useful  in  the  following  de¬ 
velopment  to  introduce  the  concept  of  essentially  local 
extremum  diminishing  (ELED)  schemes.  These  are  de¬ 
fined  to  be  schemes  which  satisfy  the  condition  that  in 
the  limit  as  the  mesh  width  Ax  0,  local  maxima  are 
non-increasing,  and  local  minima  are  non-decreasing. 


4.4.3  High  Resolution  Switched  Schemes:  Jameson- 
Schmidt-Turkel  (JST)  Scheme 


Higher  order  non-oscillatory  schemes  can  be  derived  by 
introducing  anti-diffusive  terms  in  a  controlled  manner. 
An  early  attempt  to  produce  a  high  resolution  scheme 
by  this  approacn  is  tne  Jameson-Schmidt-Turkel  (JST) 
scheme  [85].  Suppose  that  anti-diffusive  terms  are  intro¬ 
duced  by  subtracting  neighboring  differences  to  produce 
a  third  order  diffusive  flux 

j  ~  \ 

which  is  an  approximation  to  \aAx^-^.  The  positivity 
condition  (8)  is  violated  by  this  scheme.  It  proves  that  it 
generates  substantial  oscillations  in  the  vicinity  of  shock 
waves,  which  can  be  eliminated  by  switching  locally  to  the 
first  order  scheme.  The  JST  scheme  therefore  introduces 
blended  diffusion  of  the  form 


The  idea  is  to  use  variable  coefficients  and 

which  produce  a  low  level  of  diffusion  in  regions  where 
the  solution  is  smooth,  but  prevent  oscillations  near  dis¬ 
continuities.  If  e  1  is  constructed  so  that  it  is  of  order 

Ax^  where  the  solution  is  smooth,  while  e  is  of  order 
unity,  both  terms  in  dj+i  will  be  of  order  Ax^. 

The  JST  scheme  has  proved  very  effective  in  practice  in 
numerous  calculations  of  complex  steady  flows,  and  con¬ 
ditions  under  which  it  could  be  a  total  variation  dimin¬ 
ishing  (TVD)  scheme  have  been  examined  by  Swanson 
and  Turkel  [165].  An  alternative  statement  of  sufficient 

conditions  on  the  coefficients  ande^f^  for  the  JST 
scheme  to  be  LED  is  as  follows: 


Theorem  1  (Positivity  of  the  JST  scheme) 

Suppose  that  whenever  either  Vj+\  or  vj  is  an  extremum 
the  coefficients  of  the  JST  scheme  satisfy 


1 

-  2  “j+i  ’ 


(17) 


Then  the  JST  scheme  is  local  extremum  diminishing 
(LED). 

Proof:  We  need  only  consider  the  rate  of  change  of  v  at 
extremal  points.  Suppose  that  vj  is  an  extremum.  Then 


,(4)„(4)  . 

e,+rS-r  ’ 


and  the  semi-discrete  scheme  (11)  reduces  to 


.(2) 

.(2) 


1  > 

i 

1  ^ 

+  —CLa  1 

Avx  1 

2  J-?; 

1  3-1 

and  each  coefficient  has  the  required  sign.  □ 
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In  order  to  construct  andcf^^i  with  the  desired  prop- 

J  2  3  1 

erties  define 


Riu,v')  = 


u__v_\  jfy  ^Ooru  ^0 

0  if  u=t;=0, 


H+R 


(18) 


where  g  is  a  positive  integer.  Then  v)  =1  if  u  and  v 
have  opposite  signs.  Otherwise  Riu,v)  <  1.  Now  set 


Qj=R (Ai;j+ 1 ,  AVj_ i ) ,  Qj+i=maxiQj ,  Qj+i ) . 


and 


(19) 


4.4A  Symmetric  Limited  Positive  (SLIP)  Scheme 

An  alternative  route  to  high  resolution  without  oscillation 
is  to  introduce  flux  limiters  to  mrarantee  the  satisfaction 
of  the  positivity  condition  (8).  ^e  use  of  limiters  dates 
back  to  the  work  of  Boris  and  Book  [23].  A  particularly 
sinmle  w^  to  introduce  limiters,  proposed  by  the  author 
in  1984  [68],  is  to  use  flux  limited  dissipation.  In  this 
scheme  the  Aird  order  diffusion  defined  by  equation  (15) 
is  modified  by  the  insertion  of  limiters  wmch  produce  an 
Muivalent  three  point  scheme  with  positive  coefficients. 
The  original  scheme  [68]  can  be  improved  in  the  following 
manner  so  that  less  restrictive  flux  limiters  are  required 
Let  L(u,v)  be  a  limited  average  of  u  and  v  with  the 
following  properties: 


Set 


and 


-H 


L (.Avj+  ^,AVj_L)  =  </)(?•■*■)  AVj_  1 

L(.Avj_^,  AVj+ 1 )  =  ^(r~)  AVj+ 1 . 

Then, 

{“j-j  }aVj._^.(21) 


Thus  the  scheme  satisfies  the  LED  condition  if  a_j+i  > 

5  flj+i  I  for  all  j,  and  (j)ir)  >  0,  which  is  assured  by 

property  (P4)  on  L.  At  the  same  time  it  follows  from 
property  (P3)  that  the  first  order  diffusive  flux  is  can¬ 
celed  when  Av  is  smoothly  varying  and  of  constant  sign. 
Schemes  constructed  by  this  formulation  will  be  referred 
to  as  symmetric  limited  positive  (SLIP)  schemes.  This 
result  may  be  summarized  as 


Theorem  2  (Positivity  of  the  SLIP  scheme) 

Suppose  that  the  discrete  conservation  law  (11)  contains 
a  limited  diffusive  flux  as  defined  by  equation  (20).  Then 
the  positivity  condition  (14),  together  with  the  proper¬ 
ties  (P1-P4)  for  limited  averages,  are  sufficient  to  ensure 
satisfaction  of  the  LED  principle  that  a  local  maximum 
cannot  increase  and  a  local  minimum  cannot  decrease.  □ 


PL  L(.u,v)  =L(v,u) 

P2.  L(.au,av)  =aL(u,v) 

P3.  L(.u,u)  =u 

P4.  L(,u,v)  =0  if  u  and  v  have  opposite  signs:  other¬ 
wise  L(.u,v)  has  the  same  sign  as  u  and  v. 

Properties  (P1-P3)  are  natural  properties  of  an  average. 
Property  (P4)  is  needed  for  the  construction  of  a  LED  or 
TVD  scheme. 

It  is  convenient  to  introduce  the  notation 


A  variety  of  limiters  may  be  defined  which  meet  the  re¬ 
quirements  of  properties  (P1-P4).  Define 

S (u,  v)  {sign(u)  +  sign(u)  } 

which  vanishes  is  u  and  v  have  opposite  signs. 

Then  two  limiters  which  are  appropriate  are  the  following 
well-known  schemes: 

1.  Minmod: 

Z,(u,i;)  =5(u,r;)  min(|u|,  |7;|) 


.^(r)=L(l,r)  =L(r,l), 


2.  Van  Leer: 


where  according  to  (P4)  (j)(.r)  >  0.  It  follows  from  (P2) 
on  setting  a=i  or  i  that 


L(u,v)  =5(u,t;)^!^ 

|u|  +  |i; 


V. 


In  order  to  produce  a  family  of  limiters  which  contains 
these  as  special  cases  it  is  convenient  to  set 


Also  it  follows  on  setting  u=l  and  u=r  that 
(pir)  =r(j)  . 

Thus,  if  there  exists  r  <  0  for  which  (pir)  >  0,  then 
(p  (^)  <  0.  The  only  way  to  ensure  that  ^(r)  >  0  is  to 
require  (pir)  =0  for  all  r  <  0,  corresponding  to  property 

Now  one  defines  the  diffusive  flux  for  a  scalar  conserva¬ 
tion  law  as 

^  }  •  (20) 


L(.u,v)  =^D(.u,v)  (.u  +  v), 

where  D  (u,  v)  is  a  factor  which  should  deflate  the  arith¬ 
metic  average,  and  become  zero  if  u  and  v  have  opposite 
signs.  Take 


D(u,v)  =1  -  R{u,v)  =1  — 


u  —  V 
|u|  +  |r> 


Q 


(22) 


where  Riu,  v)  is  the  same  function  that  was  introduced 
in  the  JST  scheme,  and  q  is  a  positive  integer.  Then 
D(u,  v)  =0  if  u  and  v  have  opposite  signs.  Also  if  q=l, 
L(u,v)  reduces  to  minmod,  while  if  q=2,  L(u,v)  is 
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equivalent  to  Van  Leer’s  limiter.  By  increasing  q  one  can 
generate  a  sequence  of  limited  averages  which  ^proach 
a  limit  defined  by  the  arithmetic  mean  truncatecf  to  zero 
when  u  and  v  have  opposite  signs. 

When  the  terms  are  regrouped,  it  can  be  seen  that  with 
this  limiter  the  SLIP  sememe  is  exactly  equivalent  to  the 
JST  scheme,  with  the  switch  is  defined  as 

(2)  ^ 

This  formulation  ±us  unifies  the  JST  and  SLIP  schemes. 


4.4.5  Essentially  Local  Extremum  Diminishing  (ELED) 
Scheme  with  Soft  Limiter 


The  limiters  defined  by  the  formula  (22)  have  the  disad¬ 
vantage  that  they  are  active  at  a  smooth  extrema,  reducing 
the  local  accuracy  of  the  scheme  to  first  order.  In  or¬ 
der  to  prevent  this,  the  SLIP  scheme  can  be  relaxed  to 
give  an  essentially  local  extremum  diminishing  (ELED) 
scheme  which  is  second  order  accurate  at  smpotn  extrema 
by  the  introduction  of  a  threshold  in  the  limited  average. 
Therefore  redefine  D  (u,  v)  as 


D(.u,v)  =1  — 


_ u-v _ 

max(  |u|  +  |t)| ,  eAa:'’)  ’ 


(23) 


where  r=|,  9  >  2.  This  reduces  to  the  previous  definition 
if  |u|  +  |u|  >  eAx*". 


In  any  region  where  the  solution  is  smooth,  AVj+ 1  -  Avj  _  i 

is  of  order  Ax'^.  In  fact  if  there  is  a  smooth  extremum  in 
the  neighborhood  of  Vj  or  Vj+i ,  a  Taylor  series  expansion 
indicates  that  Av;+i,  Av,+i  and  Au  •_  1  are  each  individ- 

ually  of  order  Ax'^,  since  ^=0  at  the  extremum.  It  may 
be  verified  that  second  order  accuracy  is  preserved  at  a 
smooth  extremum  if  q  >  2.  On  the  other  hand  the  lim¬ 


iter  acts  in  the  usual  way  if  Au  +3  or 


Avj_ 


>  eAx^, 


and  it  may  also  be  verified  that  in  the  limit  Ax  0 
local  maxima  are  non  increasing  and  local  minima  are 
non  decreasing  [79].  Thus  the  scheme  is  essentially  local 
extremum  diminishing  (ELED). 


The  effect  of  the  “soft  limiter”  is  not  only  to  improve  the 
accuracy:  the  introduction  of  a  threshold  below  which 
extrema  of  small  amplitude  are  accepted  also  usually  re¬ 
sults  in  a  faster  rate  of  convergence  to  a  steady  state,  and 
decreases  Ae  likelyhood  of  limit  cycles  in  which  the  lim¬ 
iter  interacts  unfavorably  with  the  corrections  produced 
by  the  updating  scheme.  In  a  scheme  recently  proposed 
by  Venkatakrishnan  a  threshold  is  introduced  precisely 
for  this  purpose  [174]. 


4.4. 6  Upstream  Limited  Positive  ( USUP)  Schemes 

By  adding  the  anti-diffusive  correction  purely  from  the 
upstream  side  one  may  derive  a  family  of  upstream  limited 
positive  (USLIP)  schemes.  Corresponding  to  the  original 
SLIP  scheme  defined  by  equation  (20),  a  USLIP  scheme 
is  obtained  by  setting 

if  Oj+i  >  0,  or 


if  aj+L  <  0.  If  o;j+i=|  Oj+i 


one  recovers  a  standard 

•t  z  i  -  I  -  i  I 

high  resolution  upwind  scheme  in  semi-discrete  form. 
Consider  the  case  that  Oj+i  >  0  and  >  0.  If  one 
sets 


+ 


r  = 


'j+j 


the  scheme  reduces  to 


Ax^=-^{4>(.r'^)  aj+i  +  {2-(l>(.r  )  )  a_,._  J  1 . 

To  assure  the  correct  sign  to  satisfy  the  LED  criterion  the 
flux  limiter  must  now  satisfy  the  additional  constraint  that 
(j)(r)  <  2. 

The  USLIP  formulation  is  essentially  equivalent  to  stan¬ 
dard  upwind  schemes  [130, 166].  Both  the  SLIP  and  US¬ 
LIP  constructions  can  be  implemented  on  unstructured 
meshes  [75, 79].  The  anti-diffusive  terms  are  then  calcu¬ 
lated  by  talang  the  scalar  product  of  the  vectors  defining 
an  edge  with  the  gradient  in  the  adjacent  upstream  and 
downstream  cells. 


4.4. 7  ^sterns  of  Conservation  Laws:  Flux  Splitting  and 
Flux-Difference  Splitting 

Steger  and  Warming  [162]  first  showed  how  to  generalize 
the  concept  of  upwinding  to  the  system  of  conservation 
laws 

.0  (24) 

by  the  concept  of  flux  splitting.  Suppose  that  the  flux  is 

split  as  /=/■*■  +  /“  where  ^  and  ^  have  positive  and 
negative  eigenvalues.  Then  Ae  first  order  upwind  scheme 
is  produced  by  taking  the  numerical  flux  to  be 

This  can  be  expressed  in  viscosity  form  as 


yj+i  fj 

)  2  ) 

*  5  [Fn  -  F) 

=  ^  (/j+l  +/j) 

where  the  diffusive  flux  is 

dj^,=^A(f^-r)j.,i.  (25) 

Roe  derived  the  alternative  formulation  of  flux  difference 
splitting  [146]  by  distributing  the  corrections  due  to  the 
flux  difference  in  each  interval  upwind  and  downwind  to 
obtain 

Ax^  HfjH  -  -  /, -1)^=0, 

where  now  the  flux  difference  fj+i  —  fj  is  split.  The 
corresponding  diffusive  flux  is 

Following  Roe’s  derivation,  let  Aj+l  be  a  mean  value 
Jacobian  matrix  exactly  satisfying  the  condition 

fj+i  ~  fj~Aj+x  (.Wj+\  —  Wj')  . 


(26) 
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Aj+ 1  may  be  calculated  by  substituting  the  weighted  av¬ 
erages 

U  .  j  ,  I  '  / 

y/PjTl  +  y/^  y/^l  +  V^ 

into  the  standard  formulas  for  the  Jacobian  matrix  A=  . 
A  splitting  according  to  characteristic  fields  is  now  ob¬ 
tained  by  decomposing  A^+i  as 

A,.+i=rAT-',  (28) 

J  2 

where  the  columns  of  T  are  the  eigenvectors  of  A.+i, 

and  A  is  a  diagonal  matrix  of  the  eigenvalues.  Now  the 
corresponding  diffusive  flux  is 


1 

2 


iWj+i  -Wj)  , 


where 

|Vi|=r|A|r-' 

and  |A|  is  the  diagonal  matrix  containing  the  absolute 
values  of  the  eigenvalues. 


4.4.8  Alternative  Splittings 

Characteristic  splitting  has  the  advantages  that  it  intro¬ 
duces  the  minimum  amount  of  diffusion  to  exclude  the 
growth  of  local  extrema  of  the  characteristic  variables,  and 
uiat  with  the  Roe  linearization  it  allows  a  discrete  shock 
structure  with  a  single  interior  point.  To  reduce  the  com¬ 
putational  complexity  one  may  replace  |A|  by  a/  where 
if  a  is  at  least  equal  to  the  spectral  radius  max  |A(A)  |, 
then  the  positivity  conditions  will  still  be  satisfied.  Then 
the  first  order  scheme  simply  has  the  scalar  diffusive  flux 

<29) 

The  JST  scheme  with  scalar  diffusive  flux  captures  shock 
waves  with  about  3  interior  points,  and  it  has  oeen  widely 
used  for  transonic  flow  calculations  because  it  is  both 
robust  and  computationally  inexpensive. 

An  intermediate  class  of  schemes  can  be  formulated  by 
defining  the  first  order  diffusive  flux  as  a  combination  of 
differences  of  the  state  and  flux  vectors 

c  {Wj+i  -  Wj)  +  .  (/,+,  -  fj)  ,  (30) 

where  the  factor  c  is  included  in  the  first  term  to  make 
a*. ,  and  /3,+  i  dimensionless.  Schemes  of  this  class 

are  fully  upwind  in  supersonic  flow  if  one  takes  =0 


and  /Jj+i=sign(M)  when  the  absolute  value  of  the  Mach 

number  M  exceeds  1.  The  flux  vector  /  can  be  decom¬ 
posed  as 


f=UW  +  fp, 

(31) 

where 

\  up  J 

(32) 

Then 

fj+l  -  fj=u  (wj+i  -  Wj)  +W  (Uj  +  i  -■Uj)+  fp.^^ 

where  u  and  w  are  the  arithmetic  averages 

“  fpj  ! 
(33) 

K+l  +  Uj)  ,  W=^  {Wj+i  +  Wj)  . 


Thus  these  schemes  are  closely  related  to  schemes  which 
introduce  separate  splittings  of  the  convective  and  pres¬ 
sure  terms,  such  as  the  wave-particle  scheme  [141,  8],  the 
advection  upwind  splitting  method  (AUSM)  [106,  176], 
and  the  convective  upwind  and  split  pressure  (CUSP) 
schemes  [76]. 

In  order  to  examine  the  shock  capturing  properties  of  these 
various  schemes,  consider  the  general  case  of  a  first  order 
diffusive  flux  of  the  form 

K+i  " 

where  the  matrix  Bj+i  determines  the  properties  of  the 
scheme  and  the  scaling  factor  is  included  for  con¬ 
venience.  All  the  previous  schemes  can  be  obtained  by 
representing  Bj+\  as  a  polynomial  in  the  matrix 

defined  by  equation  (26).  Schemes  of  this  class  were 
considered  by  Van  Leer  [99].  According  to  the  Cayley- 
Hamilton  theorem,  a  matrix  satisfies  its  own  characteristic 
equation.  Therefore  the  third  and  higher  powers  of  A  can 
be  eliminated,  and  there  is  no  loss  of  generality  in  limiting 
Sj+i  to  a  polynomial  of  degree  2, 

Bj+i=ao/  +  oiiAj+x  +  axA^^^.  (35) 

The  characteristic 

upwind  scheme  for  which  B^+i  =  jAj+i  |  is  obtained  by 

substituting  Aj+i=TAT~^,  A}.^^=T .  Then  ao, 
ai ,  and  ax  are  determined  from  the  three  equations 

ao  +  aiAfc  +  Q!2A|=lAife| ,  A:=l,2,3. 

The  same  representation  remains  valid  for  three  dimen¬ 
sional  flow  because  Aj+^  still  has  only  three  distinct 

eigenvalues  u,u  +  c,u  —  c. 


4.4.9  Analysis  of  Stationary  Discrete  Shocks 


Figure  4:  Shock  structure  for  single  interior  point. 

The  ideal  model  of  a  discrete  shock  is  illustrated  in  fig¬ 
ure  (4).  Suppose  that  wi  and  wr  are  left  and  rignt 
states  which  satisfy  the  jump  conditions  for  a  stationary 
shock,  and  that  the  corresponding  fluxes  are  /£=/(w£,) 
and  fR=f(.WR).  Since  the  shock  is  stationary  fL=fR- 
The  ideal  discrete  shock  has  constant  states  in/,  to  the  left 
and  Wr  to  the  right,  and  a  single  point  with  an  intermedi¬ 
ate  value  wa-  Twe  intermediate  value  is  needed  to  allow 
the  discrete  solution  to  correspond  to  a  true  solution  in 
which  the  shock  wave  does  not  coincide  with  an  interface 
between  two  mesh  cells. 

Schemes  corresponding  to  one,  two  or  three  terms  in  equa¬ 
tion  (35)  are  examined  in  [80].  The  analysis  of  these  three 
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cases  shows  that  a  discrete  shock  structure  with  a  single 
interior  point  is  supported  by  artificial  diffusion  that  sat¬ 
isfies  the  two  conditions  that 

1 .  it  produces  an  upwind  flux  if  Ae  flow  is  determined 
to  be  supersonic  through  the  interface 

2.  it  satisfies  a  generalized  eigenvalue  problem  for  the 
exit  from  the  shock  of  the  form 

{Aar  -  Q-arBar)  (wr  -  wa)  =0,  (36) 

where  Aar  is  the  linearized  Jacobian  mattix  and  Bar 
is  the  matrix  defining  the  diffusion  for  the  interface  AR. 
This  follows  from  the  equilibrium  condition  hRA^hRR 
for  the  cell  j  +  1  in  figure  4.  These  two  conditions  are 
satisfied  by  both  the  characteristic  scheme  and  also  the 
CUSP  scheme,  provided  that  the  coefficients  of  convective 
diffusion  and  pressure  differences  are  correctly  balanced. 
Scalar  diffusion  does  not  satisfy  the  first  condition.  In  the 
case  of  the  CUSP  scheme  (30)  equation  (36)  reduces  to 


the  corresponding  error  in  the  temperature  may  lead  to  a 
wrong  prediction  of  associated  effects  such  as  chemical 
reactions. 

The  source  of  the  error  in  the  stagnation  enthalpy  is  the 
discrepancy  between  the  convective  terms 


in  the  flux  vector,  which  contain  pH,  and  the  state  vector 
which  contains  pE.  This  may  be  remedied  by  introducing 
a  modified  state  vector 


Then  one  introduces  the  linearization 

Ir  -  fL=AhiWhR  -Whi,) . 


(^Ara  + 


a*c  \ 


(wr-wa)=0 


Here  A/t  may  be  calculated  in  the  same  way  as  the  stan¬ 
dard  Roe  linearization.  Introduce  the  weighted  averages 
defined  by  equation  (27).  Then 


Thus  wr  -  Wa  is  an  eigenvector  of  the  Roe  matrix  Ara, 
and  is  the  corresponding  eigenvalue.  Since  the 

eigenvalues  are  u,u  +  c,  and  u  —  c,  the  only  choice  which 
leads  to  positive  diffusion  when  u  >  0  is  u  —  c,  yielding 
the  relationship 


AkA 


7  2 

-uH 


The  eigenvalues  of  Ah  are  u,  A'*’  and  A  where 


a*c=(l  +  /))  (c-  u)  ,0  <  u  <  c 

Thus  there  is  a  one  parameter  family  of  schemes  which 
support  the  ideal  shock  structure.  The  term  l3 {/r  —  f  a) 
contributes  to  the  diffusion  of  the  convective  terms.  Al¬ 
lowing  for  the  split  (31),  the  total  effective  coefficient  of 
convective  diffusion  is  ac=a*c  +  Ru.  A  CUSP  scheme 
with  low  numerical  diffusion  is  then  obtained  by  taking 
a=\M\,  leading  to  the  coefficients  illustrated  in  figure  5. 


Figure  5:  Diffusion  Coefficients. 


4.4.10  CUSP  and  Characteristic  Schemes  Admitting 
Constant  Total  Enthalpy  in  Steady  Flow 

In  steady  flow  the  stagnation  enthalpy  H  is  constant,  cor¬ 
responding  to  the  fact  that  the  energy  and  mass  conserva¬ 
tion  equations  are  consistent  when  the  constant  factor  H 
is  removed  from  the  energy  equation.  Discrete  and  semi¬ 
discrete  schemes  do  not  necessarily  satisfy  this  property. 
In  the  case  of  a  semi-discrete  scheme  expressed  in  viscos¬ 
ity  form,  equations  (1 1)  and  (12),  a  solution  with  constant 
H  is  admitted  if  the  viscosity  for  the  energy  equation  re¬ 
duces  to  the  viscosity  for  the  continuity  equation  with  p 
replaced  by  pH.  AA^en  the  standard  characteristic  de¬ 
composition  (28)  is  used,  the  viscous  fluxes  for  p  and 
pH  which  result  from  composition  of  the  fluxes  for  the 
characteristic  variables  do  not  have  this  property,  and  H 
is  not  constant  in  the  discrete  solution.  In  practice  there 
is  an  excursion  of  H  in  the  discrete  shock  structure  which 
represents  a  local  heat  source.  In  very  high  speed  flows 


A±= 


7  +  1 


u  ± 


u)^  + 


(37) 


Now  both  CUSP  and  characteristic  schemes  which  pre¬ 
serve  constant  stagnation  enthalpy  in  steady  flow  can  be 
constructed  from  the  modified  Jacobian  matrix  Ah  [80]. 
These  schemes  also  produce  a  discrete  shock  structure 
with  one  interior  point  in  steady  flow.  Then  one  arrives  at 
four  variations  with  this  property,  which  can  conveniently 
be  distinguished  as  the  E-  and  H-CUSP  schemes,  and  the 
E-  and  H-characteristic  schemes. 


4.5  Multidimensional  Schemes 

The  simplest  approach  to  the  treatment  of  multi¬ 
dimensional  problems  on  structured  meshes  is  to  apply 
the  one-dimensional  construction  separately  in  each  mesh 
direction.  On  triangulated  meshes  in  two  or  three  dimen¬ 
sions  the  SLIP  and  USLIP  constructions  may  also  be 
implemented  along  the  mesh  edges  [79].  A  substantial 
body  of  current  research  is  directed  toward  the  imple¬ 
mentation  of  truly  multi-dimensional  upwind  schemes  in 
which  the  upwind  biasing  is  determined  by  properties  of 
the  flow  rather  than  the  mesh.  A  thorough  review  is  given 
by  Pailliere  and  Deconinck  in  reference  [132]. 

Residual  distribution  schemes  are  an  attractive  approach 
for  triangulated  meshes.  In  these  the  residual  defined  by 
the  space  derivatives  is  evaluated  for  each  cell,  and  then 
distributed  to  the  vertices  with  weights  which  depend  on 
the  direction  of  convection.  For  a  scalar  conservation 
law  the  weights  can  be  chosen  to  maintain  positivity  with 
minimum  cross  diffusion  in  the  direction  normal  to  the 
flow.  For  the  Euler  equations  the  residual  can  be  linearized 
by  assuming  that  the  parameter  vector  with  components 
.^,y/pui,  and  ^/pH  varies  linearly  over  the  cell.  Then 

dfj  {w)  dw 
dxj  ^  dxj 

where  the  Jacobian  matrices  Aj=^  are  evaluated  with 
Roe  averaging  of  the  values  of  w  at  the  vertices.  Waves 
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in  the  direction  n  can  then  be  expressed  in  terms  of  the 
eigenvectors  of  njAj,  and  a  positive  distribution  scheme 
is  used  for  waves  in  preferred  directions.  The  best  choice 
of  these  directions  is  the  subject  of  ongoing  research, 
but  preliminary  results  indicate  the  possibility  of  achiev¬ 
ing  high  resolution  of  shocks  and  contact  discontinuities 
w&ch  are  not  aligned  with  mesh  lines  [132]. 

Hirsch  and  Van  Ransbeeck  adopt  an  alternative  approach 
in  which  they  directly  construct  directional  diffusive  terms 
on  structured  meshes,  with  anti-diffusion  controlled  by 
limiters  based  on  comparisons  of  slopes  in  different  di¬ 
rections  [60].  They  also  show  promising  results  in  calcu¬ 
lations  of  nozzles  with  multiply  reflected  oblique  shocks. 


4.5.1  High  Order  Godunov  Schemes,  and  Kinetic  Flux 
Splitting 

A  substantial  body  of  current  research  is  directed  tow^d 
the  implementation  of  truly  multi-dimensional  upwind 
schemes  [59, 135, 101].  Reference  [132]  provides  a  thor¬ 
ough  review  of  recent  developments  in  this  field.  Some  of 
the  most  impressive  simulations  of  time  dependent  flows 
with  strong  shock  waves  have  been  achieved  with  higher 
order  Godunov  schemes  [  1 80] .  In  these  schemes  the  aver¬ 
age  value  in  each  cell  is  updated  by  applying  the  integral 
conservation  law  using  interface  fluxes  predicted  from 
the  exact  or  approximate  solution  of  a  Riemann  problem 
between  adjacent  cells.  A  higher  order  estimate  of  the 
solution  is  men  reconstoicted  from  the  cell  ayerages,  and 
slope  limiters  are  applied  to  the  reconstruction.  An  ex¬ 
ample  is  the  class  of  essentially  non-oscillatory  (ENO) 
schemes,  which  can  attain  a  very  high  order  of  accu¬ 
racy  at  &e  cost  of  a  substantial  increase  in  computational 
complexity  [32,  153,  151,  152].  Methods  based  on  re¬ 
construction  can  also  be  implemented  on  unstructured 
meshes  [13,  12].  Recently  there  has  been  an  increasing 
interest  in  lanetic  flux  splitting  schemes,  which  use  solu¬ 
tions  of  &e  Boltzmann  equation  or  the  BGK  equation  to 
predict  the  interface  fluxes  [42,  36,  45, 136,  181]. 


4.6  Discretization  of  the  Viscous  Terms 


For  a  cell-centered  discretization  (figure  6a)  is  needed 

at  each  face.  The  simplest  procedure  is  to  evaluate 

in  each  cell,  and  to  average  between  the  two  cells 

on  either  side  of  a  face  [87].  The  resulting  discretization 
does  not  have  a  compact  stencil,  and  supports  undamped 
oscillatory  modes.  In  a  one-dimensionm  calculation,  for 

q2  .  -  — 2  T 

example,  ^  would  be  discretized  as  - .  In 

order  to  produce  a  compact  stencil  may  be  estimated 

from  a  control  volume  centered  on  each  face,  using  formu¬ 
las  (38)  or  (39)  [144].  This  is  computationally  expensive 
because  the  number  of  faces  is  much  larger  than  the  num¬ 
ber  of  cells.  In  a  hexahedral  mesh  with  a  large  number  of 
vertices  the  number  of  faces  approaches  three  times  the 
number  of  cells. 

This  motivates  the  introduction  of  dual  meshes  for  the 
evaluation  of  the  velocity  derivatives  and  the  flux  bal¬ 
ance  as  sketched  in  figure  6.  The  figure  shows  both 


6a:  Cell-centered 

scheme.  Oij  evaluated 
at  vertices  of  the  primary 
mesh 


6b:  Cell-vertex  scheme. 
Oij  evaluated  at  cell  cen¬ 
ters  of  the  primary  mesh 


Figure  6:  Viscous  discretizations  for  cell-centered  and 
cell-vertex  algorithms. 


The  discretization  of  the  viscous  terms  of  the  Navier 
Stokes  equations  requires  an  approximation  to  the  ve¬ 
locity  derivatives  in  order  to  calculate  the  tensor  aij, 

defined  by  equation  "(3).  Then  the  viscous  terms  may  be 
included  in  the  flux  balance  (4).  In  order  to  evaluate  the 
derivatives  one  may  apply  the  Gauss  formula  to  a  control 
volume  V  with  the  boundary  S 


where  Uj  is  the  outward  normal.  For  a  tetrahedral  or 
hexahedral  cell  this  gives 


dui 

dxj 


rij  S 


(38) 


faces 


where  Ui  is  an  estimate  of  the  average  of  Ui  over  the 
face.  If  u  varies  linearly  over  a  tetrahedral  cell  this  is 
exact.  Alternatively,  assuming  a  local  transformation  to 
computational  coordinates  one  may  apply  the  chain 


rule 


9u 

dx 


9u 

du 

dx 

dx 

.9^. 

(39) 


Here  the  transformation  derivatives  can  be  evaluated 


by  the  same  finite  difference  formulas  as  the  velocity 
derivatives  In 
varying  function. 


derivatives  ^  In  this  case  ||  is  exact  if  u  is  a  linearly 


cell-centered  and  cell-vertex  schemes.  The  dual  mesh 
connects  cell  centers  of  the  primary  mesh.  If  there  is  a 
kink  in  the  primary  mesh,  the  dual  cells  should  be  formed 
by  assembling  contiguous  fractions  of  the  neighboring 
primary  cells.  On  smooth  meshes  comparable  results  are 
obtained  by  either  of  these  formulations  [114,  115,  107]. 
If  the  mesh  has  a  kink  the  cell-vertex  scheme  has  the 
advantage  that  the  derivatives  are  calculated  in  the 

interior  of  a  regular  cell,  with  no  loss  of  accuracy. 

A  desirable  property  is  that  a  linearly  varying  velocity  dis¬ 
tribution,  as  in  a  Couette  flow,  should  produce  a  constant 
stress  and  hence  an  exact  stress  balance.  This  property  is 
not  necessarily  satisfied  in  general  by  finite  difference  or 
finite  volume  schemes  on  curvilinear  meshes.  The  char¬ 
acterization  fc-exact  has  been  proposed  for  schemes  that 
are  exact  for  polynomials  of  degree  k.  The  cell- vertex  fi¬ 
nite  volume  scheme  is  linearly  exact  if  the  derivatives  are 

evaluated  by  equation  (39),  since  then  is  exactly  eval¬ 
uated  as  a  constant,  leading  to  constant  viscous  stresses 
(Tij,  and  an  exact  viscous  stress  balance.  This  remains 
true  when  there  is  a  kink  in  the  mesh,  because  the  sum¬ 
mation  of  constant  stresses  over  the  faces  of  the  kinked 
control  volume  sketched  in  figure  6  still  yields  a  perfect 

balance.  The  use  of  equation  (39)  to  evaluate  |^,  how¬ 
ever,  requires  the  additional  calculation  or  storage  of  the 
nine  metric  quantities  in  each  cell,  whereas  equation 

(38)  can  be  evaluated  from  the  same  face  areas  that  are 
used  for  the  flux  balance. 

In  the  case  of  an  unstructured  mesh,  the  weak  form  (6) 
leads  to  a  natural  discretization  with  linear  elements,  in 
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which  the  piecewise  linear  approximation  yields  a  con¬ 
stant  stress  in  each  cell.  This  method  yields  a  represen¬ 
tation  which  is  globally  correct  when  averaged  over  the 
cells,  a  result  that  can  be  proved  by  energy  estimates  for  el¬ 
liptic  problems  [164],  It  should  be  noted,  however,  that  it 
yields  formulas  that  are  not  necessarily  locally  consistent 
with  the  differential  equations,  if  Taylor  series  expansions 
are  substituted  for  the  solution  at  tne  vertices  appearing 
in  the  local  stencil.  Figure  7  illustrates  the  discretization 
of  the  Laplacian  Uxx  +  Uyy  which  is  obtained  with  linear 
elements.  It  shows  a  particular  triangulation  such  that 
the  approximation  is  locally  consistent  with  Uxx  +  3uyj,. 
Thus  the  use  of  an  irregular  triangulation  in  the  boundary 
layer  may  significantly  degrade  the  accuracy. 


Coefficients 
resulting  from 
linear  elements 


Figure  7:  Example  of  discretization  Uxx  +  "Uyy  on  a  trian¬ 
gular  mesh.  The  discretization  is  locally  equivalent  to  the 
approximation  . 


4.7  Time  Stepping  Schemes 

If  the  space  discretization  procedure  is  implemented  sep¬ 
arately,  it  leads  to  a  set  of  coupled  ordinary  differential 
equations,  which  can  be  written  in  the  form 


—  +  R(w)=0,  (40) 

where  w  is  the  vector  of  the  flow  variables  at  the  mesh 
points,  and  R(w)  is  the  vector  of  the  residuals,  consisting 
of  the  flux  bdances  defined  by  the  space  discretization 
scheme,  together  with  the  added  dissipative  terms.  If  the 
objective  is  simply  to  reach  the  steady  state  and  details 
of  the  transient  solution  are  immaterial,  foe  time-stepping 
scheme  may  be  designed  solely  to  maximize  foe  rate  or 
convergence.  The  first  decision  that  must  be  made  is 
whether  to  use  an  explicit  scheme,  in  which  foe  space 
derivatives  are  calculated  from  known  values  of  foe  now 
variables  at  foe  beginning  of  foe  time  step,  or  an  implicit 
scheme,  in  which  the  formulas  for  the  mace  derivatives 
include  as  yet  unknown  values  of  foe  now  variables  at 
foe  end  of  foe  time  step,  leading  to  the  need  to  solve 
eoupled  equations  for  foe  new  values.  The  permissi¬ 
ble  time  step  for  an  explicit  scheme  is  limited  by  the 
Courant-Friedrichs-Lewy  (CFL)  condition,  which  states 
that  a  difference  scheme  cannot  be  a  convergent  and  stable 
approximation  unless  its  domain  of  dependence  contains 
foe  domain  of  dependence  of  the  corresponding  differen¬ 
tial  equation.  One  can  anticipate  that  implicit  schemes 
will  yield  convergence  in  a  smaller  number  of  time  steps, 
because  foe  time  step  is  no  longer  constrained  by  foe 
eondition.  Implicit  schemes  will  be  efficient,  however, 
only  if  foe  decrease  in  foe  number  of  time  steps  outweighs 
foe  increase  in  foe  computational  effort  per  time  step  con¬ 
sequent  upon  foe  neea  to  solve  coupled  equations.  The 
prototype  implicit  scheme  can  be  formulated  by  estimat- 
ing  ^  at  f  +  /xAf  as  a  linear  combination  of  R  (w")  and 


RCw"'*’*) .  The  resulting  equation 
w"'^'=w”  -  At  |(1  -  /x)R  (w*^)  +  /xR  j  I 
can  be  linearized  as 

+  AtR(w")  =0. 

If  one  sets  /i=I  and  lets  At  oo  this  reduces  to  the 
Newton  iteration  ,  which  has  been  successfully  used  in 
two-dimensional  calculations  [173,  50].  In  the  three- 
dimensional  case  with,  say,  axi  N  x  N  x  N  mesh,  foe 
bandwidth  of  the  matrix  that  must  be  inverted  is  of  or¬ 
der  N'^.  Direct  inversion  requires  a  number  of  operations 
proportional  to  foe  number  of  unknowns  multiplied  by 
foe  square  of  foe  bandwidth  of  the  order  of  N'^.  This  is 
prohibitive,  and  forces  recourse  to  either  an  approximate 
factorization  method  or  an  iterative  solution  method. 

Alternating  direction  methods,  which  introduce  factors 
corresponding  to  each  coordinate,  are  widely  used  for 
structured  meshes  [17,  137].  They  cannot  be  imple¬ 
mented  on  unstructured  tetrahedral  meshes  that  do  not 
contain  identifiable  mesh  directions,  although  other  de¬ 
compositions  are  possible  [108].  If  one  chooses  to  adopt 
foe  iterative  solution  technique,  the  principal  alternatives 
are  variants  of  the  Gauss-Seidel  and  Jacobi  methods.  A 
symmetric  Gauss-Seidel  method  with  one  iteration  per 
time  step  is  essentially  equivalent  to  an  approximate 
lower-upper  (LU)  factorization  of  the  implicit  scheme 
[86, 125,  31, 184].  On  foe  other  hand,  foe  Jacobi  method 
with  a  fixed  number  of  iterations  per  time  step  reduces 
to  a  multist^e  explicit  scheme,  belonging  to  the  gen¬ 
eral  class  of  Runge-Kutta  schemes  [33].  Schemes  of  this 
type  have  proved  very  effective  for  wide  variety  of  prob¬ 
lems,  and  they  have  foe  advantage  that  they  can  be  applied 
equally  easily  on  both  structured  and  unstructured  meshes 
[84, 67,  69, 145]. 

If  one  reduces  foe  linear  model  problem  corresponding  to 
(40)  to  an  ordinary  differential  equation  by  substituting  a 
Fourier  mode  ,  the  resulting  Fourier  symbol  has 

an  imaginary  part  proportional  to  the  wave  speed,  and 
a  negative  real  part  proportional  to  foe  diffusion.  Thus 
foe  time  stepping  scneine  should  have  a  stability  region 
which  contains  substantial  intervals  of  both  the  negative 
real  axis  and  the  imaginaiy  axis.  To  achieve  this  it  pays 
to  treat  foe  convective  and  dissipative  terms  in  a  distinct 
fashion.  Thus  foe  residual  is  split  as 

R(.w)  =Q(w)  +  D(w) , 

where  Q  (w)  is  foe  convective  part  and  D  (w)  foe  dissi¬ 
pative  part.  Denote  foe  time  level  nAt  by  a  superscript  n. 
TTien  foe  multistage  time  stepping  scheme  is  formulated 
as 


y^(n+l,0) 


w 


n 


w(n+\,k)  ^ 

^  ^(n+l,m)^ 

where  foe  superscript  k  denotes  foe  fe-fo  stage,  am=l,  and 
=  g  (!/;"),  D^°^=D(u;") 


The  coefficients  ak  are  chosen  to  maximize  foe  stability 
interval  along  the  imaginary  axis,  and  foe  coefficients 
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/3fc  are  chosen  to  increase  the  stability  interval  along  the 
negative  real  axis. 

These  schemes  do  not  fall  within  the  standard  framework 
of  Runge-Kutta  schemes,  and  they  have  much  larger  sta¬ 
bility  regions  [69],  Two  schemes  which  have  been  found 
to  be  particularly  effective  are  tabulated  below.  The  first 
is  a  four-stage  scneme  with  two  evaluations  of  dissipation. 
Its  coefficients  are 


ai=5  /3i=l 

a, -I  A=0 

0:4=1  ^4=0 


(41) 


The  second  is  a  five-stage  scheme  with  three  evaluations 
of  dissipation.  Its  coefficients  are 


A-1 
02=1  lh=0 

03=1  /?3=0.56  •  (42) 

0,4  A-o 

05=1  /?5=0.44 


4.8  Multigrid  Methods 

4.8.1  Acceleration  of  Steady  Flow  Calculations 

Radical  improvements  in  the  rate  of  convergence  to  a 
steady  state  can  be  realized  by  the  multigrid  time-stepping 
technique.  The  concept  of  acceleration  by  the  introduc¬ 
tion  of  multiple  grids  was  first  proposed  by  Fedorenko 
[48].  ITiere  is  by  now  a  fairly  well-developed  theory 
of  multigrid  methods  for  elliptic  equations  based  on  the 
concept  that  ffie  updating  scheme  acts  as  a  smoothing  op¬ 
erator  on  each  grid  [24, 53].  This  theory  does  not  hold  for 
hyperbolic  systems.  Nevertheless,  it  seems  that  it  ought 
to  be  possible  to  accelerate  the  evolution  of  a  hyperbolic 
system  to  a  steady  state  by  using  large  time  steps  on  coarse 
grids  so  that  disturbances  will  be  more  rapidly  expelled 
ffirough  the  outer  boundary.  Various  multigrid  time¬ 
stepping  schemes  designed  to  take  advantage  of  this  effect 
have  been  proposed  [124,  65,  55,  71, 29,  6,  57,  83,  93]. 

One  can  devise  a  multigrid  scheme  using  a  sequence  of 
independently  generated  coarser  meshes  by  eliminating 
alternate  points  in  each  coordinate  direction.  In  order  to 
give  a  precise  description  of  the  multimd  scheme,  sub¬ 
scripts  may  be  used  to  indicate  the  grid.  Several  transfer 
operations  need  to  be  defined.  First  me  solution  vector  on 
grid  k  must  be  initialized  as 

=Tk,k-i'Wk-i, 

where  Wk-i  is  the  current  value  on  grid  k—l,  and  Tk^k-i 
is  a  transfer  operator.  Next  it  is  necessary  to  transfer  a 
residual  forcing  function  such  that  the  solution  grid  k  is 
driven  by  the  residuals  calculated  on  grid  A:  -  1.  This  can 
be  accomplished  by  setting 

Pk=Qk,k-\Rk-\  {wk-i)  —  Rk  ! 


has  to  be  transferred  back  to  grid  k  -  I  with  the  aid  of 
an  interpolation  operator  Ik-\,k-  With  properly  optimized 
coefficients  multistage  time-stepping  schemes  can  be  very 
efficient  drivers  of  the  multigrid  process.  A  W-cycle  of 
the  type  illustrated  in  Figure  8  proves  to  be  a  particularly 


Figure  8:  Multigrid  W-cycle  for  managing  the  grid  cal¬ 
culation.  E,  evaluate  the  change  in  the  flow  for  one  step; 
T,  transfer  the  data  without  updating  the  solution. 

effective  strategy  for  managing  the  work  split  between  the 
meshes.  In  a  three-dimensional  case  the  number  of  cells 
is  reduced  by  a  factor  of  eight  on  each  coarser  grid.  On 
examination  of  the  figure,  it  can  therefore  be  seen  that  the 
work  measured  in  units  corresponding  to  a  step  on  the  fine 
grid  is  of  the  order  of 

IH-  2/8  +  4/64  -H  . . .  <  4/3, 

and  consequently  the  very  large  effective  time  step  of  the 
complete  cycle  costs  only  slightly  more  than  a  single  time 
step  in  the  fine  grid. 


4.8.2  Multigrid  Implicit  Schemes  for  Unsteady  Flow 

Time  dependent  calculations  are  needed  for  a  number 
of  important  applications,  such  as  flutter  analysis,  or  the 
analysis  of  the  flow  past  a  helicopter  rotor,  in  which  the 
stability  limit  of  an  explicit  scheme  forces  the  use  of  much 
smaller  time  steps  than  would  be  needed  for  an  accurate 
simulation.  In  this  situation  a  multigrid  explicit  scheme 
can  be  used  in  an  inner  iteration  to  solve  the  equations  of 
a  fully  implicit  time  stepping  scheme  [74]. 

Suppose  that  (40)  is  approximated  as 


where  Qk,k- 1  is  another  transfer  operator.  Then  Rk  iwk ) 
is  replaced  by  Rk  (.Wk  )  +  P/t  in  the  time-  stepping  scheme. 
Thus,  the  multistage  scheme  is  reformulated  as 


=  '^k°^  -ag+iAtk  +Pfe]  . 


+  P  =0. 

Here  Dt  is  a  order  accurate  backward  difference  op¬ 
erator  of  the  form 


1  *  1 

gr=l 


The  result  then  provides  the  initial  data  for  grid  where 

A:  +  1.  Finally,  the  accumulated  correction  on  grid  k  -w". 
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Applied  to  the  linear  differential  equation 


the  schemes  with  k=l,2  are  stable  for  all  a  At  in  the  left 
half  plane  (A-stable).  Dahlquist  has  shown  that  A-stable 
linear  multi-step  schemes  are  at  best  second  order  accurate 
[38].  Gear  however,  has  shown  that  the  schemes  with 
)b  <  6  are  stiffly  stable  [49],  and  one  of  the  higher  order 
schemes  may  oner  a  better  comproinise  between  accuracy 
and  stability,  depending  on  the  application. 

Equation  (40)  is  now  treated  as  a  modified  steady  state 
problem  to  be  solved  by  a  multigrid  scheme  using  variable 
local  time  steps  in  a  fictitious  time  t*.  For  example,  in  the 
case  k=2  one  solves 


dw 


=R*(w) 


where 


R*(.w)  +  R{w)  + 

lAt  At  2At 


in  the  different  coordinate  directions.  The  need  to  resolve 
the  boundary  layer  generally  forces  the  infroduction  of 
mesh  cells  with  very  high  aspect  ratios  near  the  bound- 
are,  and  these  can  lead  to  a  severe  reduction  in  the  rate 
of  convergence  to  a  steady  state.  Pierce  has  recently  ob¬ 
tained  impressive  results  using  diagonal  and  block- Jacobi 
preconditioners  which  include  the  mesh  intervals  [133]. 

An  alternative  approach  has  recently  been  proposed  by 
Ta’asan  [168],  in  which  the  equations  are  wntten  in  a 
canonical  form  which  separates  the  equations  describ¬ 
ing  acoustic  waves  from  those  describing  convection.  In 
terms  of  ffle  velocity  components  u,  v  and  the  vorticity 
w,  temperature  T,  entropy  s  and  total  enthalpy  H,  the 
equations  describing  steady  two-dimensional  flow  can  be 
written  as 

■  D\  D2  0 

_ d_  _i 

dy  dx 

0  0  -g  - 

0  0  0 

L  0  0  0 

where 


0  0 

0  0 

Di  iPa 
0 


’5^ 
0 


■  U  - 

V 

b) 

S 

.  R  J 

=0 


and  the  last  two  terms  are  treated  as  fixed  source  terms. 
Tlie  first  term  shifts  the  Fourier  symbol  of  the  equivalent 
model  problem  to  the  left  in  the  complex  plane.  While 
this  promotes  stability,  it  may  also  require  a  limit  to  be 
imposed  on  the  magnitude  of  the  local  time  step  At*  rel¬ 
ative  to  that  of  the  implicit  time  step  At.  This  may  be 
relieved  by  a  point-implicit  modification  of  the  multi¬ 
stage  scheme  [119].  In  the  case  of  problems  with  moving 
boundaries  the  equations  must  be  modified  to  allow  for 
movement  and  deformation  of  the  mesh. 

This  method  has  proved  effective  for  the  calculation  of 
unsteady  flows  that  might  be  associated  with  wing  flutter 
[3, 4]  and  also  in  the  calculation  of  unsteady  incompress¬ 
ible  flows  [18].  It  has  the  advantage  that  it  can  be  added 
as  an  option  to  a  computer  program  which  uses  an  explicit 
multigrid  scheme,  allowing  it  to  be  used  for  the  efficient 
calculation  of  both  steady  and  unsteady  flows. 
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Here  the  first  two  equations  describe  an  elliptic  system  if 
the  flow  is  subsonic,  while  the  remaining  equations  are 
convective.  Now  separately  optimized  multigrid  proce¬ 
dures  are  used  to  solve  the  two  sets  of  equations,  which 
are  essentially  decoupled. 


4.9  Preconditioning 

Another  way  to  improve  the  rate  of  convergence  to  a 
steady  state  is  to  multiply  the  space  derivatives  in  equa¬ 
tion  (1)  by  a  preconditioning  matrix  P  which  is  designed 
to  equalize  tne  eigenvalues,  so  that  all  the  waves  can  be 
advanced  with  optimal  time  steps.  A  symmetric  precondi¬ 
tioner  which  equalizes  the  eigenvalues  has  been  proposed 
by  Van  Leer  [102].  When  the  equations  are  written  in 
stream-aligned  coordinates  this  has  the  form 

000" 

^  +  l  0  0  0 

0  r  0  0 

0  0  r  0 

0  0  0  1 

where 

/3  =  r=yi  -  M2,  if  M  <  1 
P  =  yi-M2,r=yi-^,  if  M>1 

Turkel  has  proposed  an  asymmetric  preconditioner  which 
has  dso  proved  effective,  particularly  for  flow  at  low  Mach 
numbers  [172].  The  use  of  these  preconditioners  can  lead 
to  instability  at  stagnation  points  where  there  is  a  zero 
eigenvalue  which  cannot  be  equalized  with  the  eigenval¬ 
ues  ±c. 

The  preconditioners  of  Van  Leer  and  Turkel  do  not  take 
account  of  the  effect  of  differences  in  the  mesh  intervals 


4.10  High  Order  Schemes  and  Mesh  Refinement 

The  need  both  to  improve  the  accuracy  of  computational 
simulations  and  to  assure  known  levels  of  accuracy  is  the 
focus  of  ongoing  research.  The  main  routes  to  improv¬ 
ing  the  accuracy  are  to  increase  the  order  of  the  discrete 
scheme  and  to  reduce  the  mesh  interval.  High  order  differ¬ 
ence  methods  are  most  easily  implemented  on  Cartesian, 
or  at  least  extremely  smooth  gnds.  The  expansion  of 
the  stencil  as  the  order  is  increased  leads  to  the  need  for 
complex  boundare  conditions.  Compact  schemes  keep 
the  stencil  as  small  as  possible  [140,  104,  28].  On  simple 
domains,  spectral  methods  are  particularly  effective,  es¬ 
pecially  in  the  case  of  periodic  boundary  conditions,  and 
can  be  used  to  produce  exponentially  fast  convergence  of 
the  error  as  the  mesh  interval  is  decreased  [127,  27].  A 
compromise  is  to  divide  the  field  into  subdomains  and 
introduce  high  order  elements.  This  approach  is  used  in 
the  spectral  element  method  [92]. 

High  order  difference  schemes  and  spectral  methods  have 
provenparticularly  useful  in  direct  Navier-Stokes  simula¬ 
tions  omansient  and  turbulent  flows.  High  order  methods 
are  also  beneficial  in  computational  aero-acoustics,  where 
it  is  desired  to  track  waves  over  long  distances  with  min¬ 
imum  error.  If  the  flow  contains  shock  waves  or  contact 
discontinuities,  the  ENO  method  may  be  used  to  construct 
high  order  non-oscillatory  schemes. 

In  multi-dimensional  flow  simulations,  global  reduction 
of  the  mesh  interval  can  be  prohibitively  expensive,  mo¬ 
tivating  the  use  of  adaptive  mesh  refinement  procedures 
which  reduce  the  local  mesh  width  h  if  there  is  an  indica¬ 
tion  that  the  error  is  too  large  [21,  39, 109, 61, 138, 103]. 
In  such  h-refinement  methods,  simple  error  indicators 


P= 


0 

0 
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such  as  local  solution  gradients  may  be  used.  Alterna¬ 
tively,  the  discretization  error  may  be  estimated  bv  com¬ 
paring  quantities  calculated  with  two  mesh  widths,  say 
on  the  current  mesh  and  a  coarser  mesh  with  double  the 
mesh  interval.  Procedures  of  this  kind  may  also  be  used 
to  provide  a  posteriori  estimates  of  the  error  once  the 
calculation  is  completed. 

This  kind  of  local  adaptive  control  can  also  be  applied 
to  the  local  order  of  a  finite  element  method  to  produce 
a  p-refinement  method,  where  p  represents  the  order  of 
the  polynomial  basis  functions.  Finally,  both  h-  and  p- 
refinement  can  be  combined  to  produce  an  h-p  method  in 
which  h  and  p  are  locally  optimized  to  yield  a  solution 
with  minimum  error  [126].  Such  methods  can  achieve 
exponentially  fast  convergence,  and  are  well  established 
in  computational  solid  mechanics. 


5.  CURRENT  STATUS  OF  NUMERICAL  SIMU¬ 
LATION 

This  section  presents  some  representative  numerical  re¬ 
sults  which  confirm  the  properties  of  the  ^orithms  which 
have  been  reviewed  in  the  last  section.  Tnese  have  been 
drawn  from  the  work  of  the  author  and  his  associates. 
They  also  illustrate  the  kind  of  calculation  which  can  be 
performed  in  an  industrial  environment,  where  rapid  turn 
around  is  important  to  allow  the  quick  assessment  of  de¬ 
sign  changes,  and  computational  costs  must  be  limited. 


5.1  One-dimensional  shock 


In  order  to  verify  the  discrete  structure  of  station- 
a^  shocks,  calculations  were  performed  for  a  one- 
dimensional  problem  with  initial  data  containing  left  and 
right  states  compatible  with  the  Rankine  Hugomot  condi¬ 
tions.  An  intermediate  state  consisting  of  me  arithmetic 
average  of  the  left  and  right  states  was  introduced  at  a 
single  cell  in  the  center  of  the  domain.  With  this  interme¬ 
diate  state  the  system  is  not  in  equilibrium,  and  the  time 
dependent  equations  were  solved  to  find  an  equilibrium 
solution  with  a  stationary  shock  wave  separating  the  left 
and  right  states.  Table  1  shows  the  result  for  a  shock 
wave  at  Mach  20.  This  calculation  used  the  H-CUSP 
scheme,  which  allows  a  solution  with  constant  stagna¬ 
tion  enthaby,  with  the  limiter  defined  by  equation  ^3), 
and  q=3.  The  formulation  is  described  in  detail  in  refer¬ 
ence  [80].  The  table  shows  the  values  of  H,  p,  M  and 


the  entropy  S'=log  ^  —  log  j  •  A  perfect  one  point 

shock  structure  is  displayed.  The  entropy  is  zero  to  4 
decimal  places  upstream  of  the  shock,  exhibits  a  slight 
excursion  at  the  interior  point,  and  is  constant  to  4  deci¬ 
mal  places  downstream  of  the  shock.  It  may  be  noted  that 
the  mass,  momentum  and  energy  of  the  initial  data  are 
not  compatible  with  the  final  equilibrium  state.  Accord¬ 
ing  to  conservation  arguments  the  total  mass,  momentum 
and  energy  must  remain  constant  if  the  outflow  flux  fji 
remains  equal  to  the  inflow  flux  /x,.  Therefore  fn  must 
be  allowed  to  v^  according  to  an  appropriate  outflow 
boundary  condition  to  allow  the  total  mass,  momentum 
and  energy  to  be  adjusted  to  values  compatible  with  equi¬ 
librium. 


IT 

P 

M"^ 

s 

19 

283.5000 

1.0000 

20.0000 

0.0000 

20 

283.5000 

1.0000 

20.0000 

0.0000 

21 

283.5000 

1.0000 

20.0000 

0.0000 

22 

283.4960 

307.4467 

0.7229 

40.3353 

23 

283.4960 

466.4889 

0.3804 

37.6355 

24 

283.4960 

466.4889 

0.3804 

37.6355 

25 

283.4960 

466.4889 

0.3804 

37.6355 

Table  1 :  Shock  Wave  at  Mach  20 


5.2  Euler  Calculations  for  AirfoUs  and  Wings 

The  results  of  transonic  flow  calculations  for  two  well 
known  airfoils,  the  RAE  2822  and  the  NACA  0012,  are 
presented  in  figures  (22-25).  The  H-CUSP  scheme  was 
again  used.  The  limiter  defined  by  equation  (23)  was  used 
with  g=3.  The  5  stage  time  stepping  scheme  (42)  was  aug¬ 
mented  by  the  multigrid  scheme  described  in  section  4.2 
to  accelerate  convergence  to  a  steady  state.  The  equations 
were  discretized  on  meshes  with  0-topology  extending 
out  to  a  radius  of  about  100  chords.  In  each  case  the 
calculations  were  performed  on  a  sequence  of  succes¬ 
sively  finer  meshes  from  40x8  to  320x64  cells,  while  the 
multigrid  cycles  on  each  of  these  meshes  descended  to  a 
coarsest  mesh  of  10x2  cells.  Figure  22  shows  the  inner 
parts  of  the  160x32  meshes  for  the  two  airfoils.  Figures 
23-25  show  the  final  results  on  320x64  meshes  for  the 
RAE  2822  airfoil  at  Mach  .75  and  3°  angle  of  attack,  and 
for  the  NACA  0012  airfoil  at  Mach  .8  and  1.25°  angle  of 
attack,  and  also  at  Mach  .85  and  1°  angle  of  attack.  In  the 
pressure  distributions  the  pressure  coefficient  Cp- 

is  plotted  with  the  negative  (suction)  pressures  upward,  so 
that  the  upper  curve  rraresents  the  flow  over  the  upper  side 
of  a  lifting  airfoil.  Ine  convergence  histories  snow  the 
mean  rate  of  change  of  the  density,  and  also  the  total  num¬ 
ber  of  supersonic  points  in  the  flow  field,  which  provides 
a  useful  measure  of  the  global  convergence  of  transonic 
flow  calculations  such  as  these.  In  eacn  case  the  conver¬ 
gence  history  is  shown  for  100  cycles,  while  the  pressure 
distribution  is  displayed  after  a  sufficient  number  of  cy¬ 
cles  for  its  convergence.  The  pressure  distribution  of  the 
RAE  2822  airfoil  converged  in  only  25  cycles.  Conver¬ 
gence  was  slower  for  the  NACA  0012  airfoil.  In  the  case 
of  flow  at  Mach  .8  and  1.25°  angle  of  attack,  additional 
cycles  were  needed  to  damp  out  a  wave  downstream  of 
the  weak  shock  wave  on  the  lower  surface. 

As  a  further  check  on  accuracy  the  drag  coefficient  should 
be  zero  in  subsonic  flow,  or  in  shock  free  transonic  flow. 
Table  2  shows  the  computed  drag  coefficient  on  a  se¬ 
quence  of  three  meshes  for  three  exaitmles.  The  first  two 
are  subsonic  flows  over  the  RAE  2822  and  NACA  0012 
airfoils  at  Mach  .5  and  3°  angle  of  attack.  The  third  is  the 
flow  over  the  shock  free  Kom  airfoil  at  its  design  point 
of  Mach  .75  and  0°  angle  of  attack.  In  all  three  cases  the 
drag  coefficient  is  calculated  to  be  zero  to  four  digits  on  a 
160x32  mesh. 


Mesh 

RAE  2822 
Mach  .50 
a  3° 

NACA  0012 
Mach  .50 
a  3° 

Kom  Airtbil 
Mach  .75 
a0° 

40x8 

11062“ 

.0047 

.0098 

80x16 

.0013 

.0008 

.0017 

160x32 

.0000 

.0000 

.0000 

Table  2:  Drag  Coefficient  on  a  sequence  of  meshes 


As  a  further  test  of  the  performance  of  the  H-CUSP 
scheme,  the  flow  past  the  ONERA  M6  wing  was  cal¬ 
culated  on  a  mesh  with  C-H  topology  and  192x32x48  = 
294912  cells.  Figure  26  shows  the  result  at  Mach  .84 
and  3.06°  angle  of  attack.  This  again  verifies  the  non- 
oscillatory  character  of  the  solution,  and  the  sharp  resolu¬ 
tion  of  shock  waves.  In  this  case  50  cycles  were  sufficient 
for  convergence  of  the  pressure  distributions. 

Figure  9  shows  a  calculation  of  the  Northrop  YF23  by  R.  J. 
Busch,  Jr.,  who  used  the  author’s  FL057  code  to  solve 
the  Euler  equations  [26].  Although  an  inviscid  model  of 
the  flow  was  used,  it  can  be  seen  that  the  simulations  are 
in  good  agreement  with  wind  tunnel  measurements  both 
at  Mach  .90,  with  angles  of  attack  of  0, 8  and  16  degrees, 
and  at  Mach  1.5  with  angles  of  attack  of  0,  4  and  8  de¬ 
grees.  At  a  high  angle  of  attack  the  flow  separates  from 
me  leading  edge,  and  this  example  shows  that  in  situations 
where  the  point  of  separation  is  fixed,  an  inviscid  model 
may  still  produce  a  useful  prediction.  Thus  valuable  in- 
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formation  for  the  aerodynamic  design  could  be  obtained 
witii  a  relatively  inexpensive  computational  model. 


Figure  9:  Comparison  of  Experimental  and  Computed 
Drag  Rise  Curve  for  the  YF-23  (Supplied  by  R.  J.  Bush 
Jr.) 


No.  ot  Nodes 

Seconds/Cycle 

Speedup 

'1 . . 

1.00 

- 2 

18X1 

r99 

- 4 

9.11 

- 8 

4.66 

16 

239 

Table  3:  AIRPLANE  Parallel  Performance  on  the  SP2, 
MD- 1 1  Model 


acceptable  mesh.  Figures  10  and  11  show  calculations 
for  supersonic  transport  configurations  which  were  per¬ 
formed  by  Susan  Cliff.  The  agreement  with  experimental 
data  is  quite  good,  and  it  has  2so  been  possible  to  predict 
the  sonic  boom  signature  [34].  Figure  12  shows  an  Euler 
calculation  for  the  McDonnell  Douglas  MDl  1  with  flow 
through  the  engine  nacelles,  using  348407  mesh  points  of 
2100466  tetrahedra.  This  calculation  takes  4  hours  on  an 
IBM  590  workstation.  A  parallel  version  of  the  code  has 
been  develcmed  in  collaboration  with  W.S.  Cheng,  and  the 
same  calculation  can  be  performed  in  20  minutes  using 
16  processors  of  an  IBM  SP2.  The  parallel  speed-up  for 
the  mDI  1  is  shown  in  table  3. 


Figure  11;  Pressure  Contours  and  Sonic  Boom  on  a  Rep¬ 
resentative  HSCT  Configuration 


Force  Cuefiicients,  Mach  2.1. 


Figure  10:  Comparison  of  Experimental  and  Calculated 
Results  for  a  HSCT  Configuration 


The  next  figures  show  the  results  of  calculations  using  the 
AIRPLANE  code  developed  by  T.J.  Baker  and  the  author, 
to  solve  the  Euler  equations  on  an  unstructured  mesh.  This 
provides  the  flexibility  to  treat  arbitrarily  complex  config¬ 
urations  without  the  need  to  spend  months  developing  an 


5.3  Viscous  Flow  Calculations 

The  next  figures  show  viscous  simulations  based  on  the 
solution  of  the  Reynolds  averaged  Navier  Stokes  equa¬ 
tions  with  turbulence  models.  Figure  13  shows  a  two- 
dimensional  calculation  for  the  RAE  2822  airfoil  by  L. 
Martinelli.  The  vertical  axis  represents  the  negative  pres¬ 
sure  coefficient,  and  there  is  a  shock  wave  hair  way  mong 
the  upper  surface.  This  example  confirms  that  in  the 
absence  of  significant  shock  induced  separation,  simula¬ 
tions  performed  on  a  sufficiently  fine  mesh  (with  5 12  x  64 
cells)  can  produce  excellent  agreement  with  experimental 
data.  Figure  21  shows  a  simulation  of  the  McDonnell- 
DouglasF18  performed  by  R.M.  Cummings,  Y.M.  Rizk, 
L.B.  Schiff  and  N.M.  Chaderjian  at  NASA  Ames  [37]. 
They  used  a  multiblock  mesh  with  about  900000  mesh 
points.  While  this  is  probably  not  enough  for  an  accu¬ 
rate  quantitative  prediction,  the  agreement  with  both  the 
experimental  data  and  the  visualization  are  quite  good. 

Figure  14  shows  an  unsteady  flow  calculation  for  a 
pitching  airfoil  performed  by  J.  Alonso  using  the  code 
IJIT082,  which  h^ointly  developed  with  L.  Martinelli 
and  the  author  [4] .  This  uses  the  multimd  implicit  scheme 
described  in  Section  3.7.2  which  allows  the  number  of 
time  steps  to  be  reduced  from  several  thousand  to  36  per 
pitching  cycle.  The  agreement  with  experimental  data  is 
quite  good. 
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Figure  12:  Computed  Pressure  Field  for  a  McDonnell 
Douglas  MD 11 


5.4  Ship  Wave  Resistance  calculations 

Figures  15-17  show  the  results  of  an  application  of  the 
same  multigrid  finite  volume  techniques  to  the  calculation 
of  the  flow  past  a  naval  frigate,  using  a  code  which  was 
developed  by  J.  Farmer,  L.  Martinelli  and  the  author  [47]. 
The  mesh  was  adjusted  during  the  course  of  the  calcu¬ 
lation  to  conform  to  the  free  surface  in  order  to  satisfy 
the  exact  non-linear  boundary  condition,  while  artificial 
compressibility  was  used  to  treat  the  incompressible  flow 
equations. 


RAE2822  6  .225 

MACH  0.729  ALPHA  UIO 
CL  17213  CD  0.0071  CM  <0911 
ORID 117X64  NCYC  73  HESOOOOErfO 


6.  AERODYNAMIC  SHAPE  OPTIMIZATION 

6.1  Optimization  and  Design 

Traditionally  the  process  of  selecting  design  variations  has 
been  carried  out  by  trial  and  error,  relying  on  the  intuition 
and  experience  of  the  designer.  With  currently  available 
equipment  the  turn  around  for  numerical  simulations  is 
becominig  so  rapid  (hat  it  is  feasible  to  examine  an  ex¬ 
tremely  &ge  number  of  variations.  It  is  not  at  all  likely 
that  repeated  trials  in  an  interactive  desmn  and  analysis 
procemire  can  lead  to  a  truly  optimum  design.  In  order 
to  take  full  advantage  of  the  possibility  of  examining  a 
large  design  space  me  numerical  simulations  need  to  be 
combined  with  automatic  search  and  optimization  proce¬ 
dures.  This  can  lead  to  automatic  design  methods  which 
will  fully  realize  the  potential  improvements  in  aerody¬ 
namic  efficiency. 

The  simplest  approach  to  optimization  is  to  define  the 
geometry  througn  a  set  of  design  parameters,  which  may, 
for  example,  be  the  weights  ai  applied  to  a  set  of  shape 
functions  bi  (x)  so  that  the  shape  is  represented  as 

fix)  =^aibiix)  . 

Then  a  cost  function  I  is  selected  which  might,  for  exam¬ 
ple,  be  the  drag  coefficient  or  the  lift  to  drag  ratio,  and  I 
IS  regarded  as  a  function  of  the  parameters  ai.  The  sen¬ 
sitivities  ^  may  now  be  estimated  by  making  a  small 
variation  daj  in  each  design  parameter  in  turn  and  recal¬ 
culating  the  flow  to  obtain  the  change  in  I.  Then 

dl  ^  I iai  +  8ai)  —I (Qj) 
dai  5ai 

The  gradient  vector  may  now  be  used  to  determine  a 
direction  of  improvement.  The  simplest  procedure  is  to 
make  a  step  in  the  negative  gradient  direction  by  setting 

a^+^=a^  -  XSa, 


Figure  13:  Two-Dimensional  Turbulent  Viscous  Calcula¬ 
tion  (by  Luigi  Martinelli) 


so  that  to  first  order 


di^ 

I  +  (5/=J  -  ^Sa=I 
oa 


dl'^dl 
da  da 


More  sophisticated  searchprocedures  may  be  used  such  as 
quasi-Newton  methods,  which  attempt  to  estimate  the  sec¬ 
ond  derivative  of  the  cost  function  from  changes  in 

the  gradient  ^  in  successive  optimization  steps.  These 
methods  also  generally  introduce  line  searches  to  find 
the  minimum  in  the  search  direction  which  is  defined  at 
each  step.  The  main  disadvantage  of  this  approach  is  the 
need  for  a  number  of  flow  calculations  proportional  to  the 
number  of  design  variables  to  estimate  the  gradient.  The 
computational  costs  can  thus  become  prohibitive  as  the 
number  of  design  variables  is  increased. 

An  alternative  approach  is  to  cast  the  design  problem  as  a 
search  for  the  shape  that  will  generate  the  desired  pressure 
distribution.  This  approach  recognizes  that  the  designer 
usually  has  an  idea  of  the  the  kind  of  pressure  distribu¬ 
tion  that  will  lead  to  the  desired  performance.  Thus,  it  is 
useful  to  consider  the  inverse  problem  of  calculating  the 
shape  that  will  lead  to  a  given  pressure  distribution.  The 
method  has  the  advantage  that  only  one  flow  solution  is 
required  to  obtain  the  desired  design.  Unfortunately,  a 
physically  realizable  shape  may  not  necessarily  exist,  un¬ 
less  the  pressure  distribution  satisfies  certain  constraints. 
Thus  the  problem  must  be  very  carefully  formulated;  oth¬ 
erwise  it  may  be  ill  posed. 

The  difficulty  that  the  target  pressure  may  be  unattainable 
may  be  circumvented  by  treating  the  inverse  problem  as 
a  special  case  of  the  optimization  problem,  with  a  cost 
function  which  measures  the  error  in  the  solution  of  the 
inverse  problem.  For  example,  ifpa  is  the  desired  surface 
pressure,  one  may  take  the  cost  function  to  be  an  integral 
over  the  the  body  surface  of  the  square  of  the  pressure 
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Figure  14:  Mach  Number  Contours.  Pitching  Airfoil 
Case.  Re=\.0  x  10«,  Moo =0.796,  Kc=0.202. 


error, 


or  possibly  a  more  general  Sobolev  norm  of  the  pressure 
error.  This  has  the  advantage  of  converting  a  possibly  ill 
posed  problem  into  a  well  posed  one.  It  has  the  disadvan¬ 
tage  that  it  incurs  the  computational  costs  associated  with 
optimization  procedures. 


6.2  Application  of  Control  Theory 

In  order  to  reduce  the  computational  costs,  it  turns  out  that 
there  are  advantages  in  formulating  both  the  inverse  prob¬ 
lem  and  more  general  aerodynamic  problems  within  the 
framework  of  the  mathematical  theory  for  the  control  of 
systems  governed  by  partial  differential  equations  [105]. 
A  wing,  mr  example,  is  a  device  to  produce  lift  by  control¬ 
ling  the  flow,  and  its  design  can  be  regarded  as  a  problem 
in  the  optimal  control  of  the  flow  equations  by  variation 


Figure  15:  Contours  of  Surface  Wave  Elevation  for  a 
Combatant  Ship 


Figure  16:  Contours  of  Surface  Wave  Elevation  Near  the 
Transom  Stem 


Figure  17:  Pressure  Contours  in  the  Bow  Region 


of  the  shape  of  the  boundary.  If  the  boundary  shape  is  re¬ 
garded  as  arbitrary  within  some  requirements  of  smooth¬ 
ness,  then  the  full  generality  of  shapes  cannot  be  defined 
with  a  finite  number  of  parameters,  and  one  must  use  the 
concept  of  the  Frechet  derivative  of  the  cost  with  respect 
to  a  function.  Clearly,  such  a  derivative  cannot  be  deter¬ 
mined  directly  by  finite  differences  of  the  design  param¬ 
eters  because  there  are  now  an  infinite  number  of  these. 
Using  techniques  of  control  theory,  however,  the  gradient 
can  be  determined  indirectly  by  solving  an  adjoint  equa¬ 
tion  which  has  coefficients  denned  by  the  solution  of  the 
flow  equations.  The  cost  of  solving  the  adjoint  equation 
is  comparable  to  that  of  solving  the  flow  equations.  Thus 
the  gradient  can  be  determined  with  roughly  the  compu¬ 
tational  costs  of  two  flow  solutions,  independently  of  the 
number  of  design  variables,  which  may  be  infinite  if  the 
boundary  is  regarded  as  a  free  surface. 

For  flow  about  an  airfoil  or  wing,  the  aerodynamic  prop¬ 
erties  which  define  the  cost  function  are  functions  of  the 
flow-field  variables  (lu)  and  the  physical  location  of  the 
boundary,  which  may  be  represented  by  the  function  J", 
say.  Then 

I=I{w,T), 

and  a  change  in  !F  results  in  a  change 
dl'^  dl'^ 

in  the  cost  function.  Using  control  theory,  the  governing 
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equations  of  the  flowfield  are  introduced  as  a  constrdnt 
in  such  a  way  that  the  final  eiqiression  for  the  gradient 
does  not  require  reevaluation  of  the  flowfield.  In  order  to 
achieve  this  Sw  must  be  eliminated  from  (43).  Simpose 
that  the  governing  equation  R  which  expresses  the  depen¬ 
dence  of  w  and  T  within  the  flowfield  domain  D  can  be 
written  as 

R(w,r)=0.  (44) 

Then  6w  is  determined  from  the  equation 


(5i?= 


dw 


6w  + 


dR 

dT 


(45) 


Next,  introducing  a  Lagrange  Multiplier  ?/>,  we  have 


hi 


dl'^  ^  ^  dF 


Choosing  ip  to  satisfy  the  adjoint  equation 


dR 


iT 


dw 


I 


(46) 


the  first  term  is  eliminated,  and  we  find  that 
5I=Q8T, 


(47) 


where 


dT 


The  advantage  is  that  (47)  is  independent  of  hw,  with  the 
result  that  the  gradient  of  I  with  respect  to  an  arbitrary 
number  of  design  variables  can  be  determined  without  the 
need  for  additional  flow-field  evaluations.  In  the  case  that 
(44)  is  a  partial  differential  equation,  the  adjoint  equation 
(46)  is  also  a  partial  differential  equation  and  appropriate 
boundary  conditions  must  be  determined. 


After  making  a  step  in  the  negative  gradient  direction, 
the  gradient  can  be  recalculated  and  the  process  repeated 
to  follow  a  path  of  steepest  descent  until  a  minimum  is 
reached.  In  order  to  avoid  violating  constraints,  such  as 
a  minimum  acceptable  wing  thickness,  the  gradient  may 
be  projected  into  the  allowable  subspace  within  which 
the  constraints  are  satisfied.  In  this  way  one  can  devise 
procedures  which  must  necessarily  converge  at  least  to  a 
local  minimum,  and  which  can  be  accelerated  by  the  use 
of  more  sophisticated  descent  methods  such  as  conjugate 

fradient  or  quasi-Newton  algorithms.  There  is  the  possi- 
ility  of  more  than  one  local  minimum,  but  in  any  case 
the  method  will  lead  to  an  improvement  over  the  original 
design.  Furthermore,  unlike  the  traditional  inverse  algo¬ 
rithms,  any  measure  of  performance  can  be  used  as  the 
cost  function. 


In  reference  [72]  the  author  derived  the  adjoint  equations 
for  transonic  flows  modelled  by  both  the  potential  flow 
equation  and  the  Euler  equations.  The  theory  was  de¬ 
veloped  in  terms  of  partial  differential  equations,  leading 
to  an  adjoint  partial  differential  equation.  In  order  to 
obtain  numerical  solutions  both  the  flow  and  the  adjoint 
equations  must  be  discretized.  The  control  theory  might 
be  applied  directly  to  the  discrete  flow  equations  which 
result  from  the  numerical  approximation  or  the  flow  equa¬ 
tions  by  finite  element,  finite  volume  or  finite  difference 
procedures.  This  leads  directly  to  a  set  of  discrete  adjoint 
equations  with  a  matrix  which  is  the  transpose  of  the  Jaco¬ 
bian  matrix  of  the  full  set  of  discrete  nonunear  flow  equa¬ 
tions.  Onathree-dimensionalmeshwithindicesf,  j,  fcthe 
individual  adjoint  equations  may  be  derived  by  collecting 
together  all  the  terms  multiplied  by  the  variation 
of  the  discrete  flow  variable  tUi.j.fc.  The  resulting  discrete 
adjoint  equations  represent  a  possible  discretization  of  the 


adjoint  partial  differential  equation.  If  these  equations  are 
solved  exactly  they  can  provide  an  exact  gradient  of  the 
inexact  cost  function  which  results  from  the  discretization 
of  the  flow  equations.  On  the  other  hand  any  consistent 
discretization  of  the  adjoint  partial  differential  equation 
will  yield  the  exact  jgradient  in  the  limit  as  the  mesh  is 
refined.  The  trade-oii  between  the  complexity  of  the  ad¬ 
joint  discretization,  the  accuracy  of  the  resulting  estimate 
of  the  gradient,  and  its  impact  on  the  computational  cost 
to  approach  an  optimum  solution  is  a  subject  of  ongoing 
research. 

The  true  optimum  shape  belongs  to  an  infinitely  dimen¬ 
sional  space  of  design  parameters.  One  motivation  for 
developing  the  theory  for  the  partial  differential  equa¬ 
tions  of  the  flow  is  to  provide  an  indication  in  principle 
of  how  such  a  solution  could  be  approached  if  sufficient 
computational  resources  were  available.  Another  moti¬ 
vation  is  that  it  highlights  the  possibility  of  generating 
ill  posed  formulations  of  the  problem.  For  example,  if 
one  attempts  to  calculate  the  sensitivity  of  the  pressure 
at  a  particular  location  to  changes  in  the  bound^  shape, 
there  is  the  possibility  that  a  shape  modification  could 
cause  a  shock  wave  to  pass  over  that  location.  Then  the 
sensitivity  could  become  unbounded.  The  movement  of 
the  shock,  however,  is  continuous  as  the  shape  changes. 
Therefore  a  quantity  such  as  the  drag  coefficient,  wmch 
is  determineci  by  integrating  the  pressure  over  the  surface, 
also  depends  continuously  on  the  shape.  The  adjoint 
equation  allows  the  sensitivity  of  the  drag  coefficient  to 
be  determined  without  the  explicit  evaluation  of  pressure 
sensitivities  which  would  be  ill  posed. 

The  discrete  adjoint  equations,  whether  they  are  derived 
directly  or  by  discretization  of  the  adjoint  partial  differen¬ 
tial  equation,  are  linear.  Therefore  they  could  be  solved 
by  direct  numerical  inversion.  The  cost  of  direct  inversion 
can  become  prohibitive,  however,  as  the  mesh  is  refined, 
and  it  becomes  more  efficient  to  use  iterative  solution 
methods.  Moreover,  because  of  the  similarity  of  the  ad¬ 
joint  equations  to  the  flow  equations,  the  same  iterative 
methods  which  have  been  proved  to  be  efficient  for  the 
solution  of  the  flow  equations  are  efficient  for  the  solution 
of  the  adjoint  equations. 

The  control  theory  formulation  for  optimal  aerodynamic 
design  has  proved  effective  in  a  variety  of  applications 
[73, 77, 142].  The  adjoint  equations  have  also  been  used 
by  Ta’asan,  Kuruvila  and  Salas  [167],  who  have  imple¬ 
mented  a  one  shot  approach  in  which  the  constraint  repre¬ 
sented  by  the  flow  equations  is  only  required  to  be  satisfied 
by  the  final  converged  solution,  and  computational  costs 
are  also  reduced  by  applying  multigrid  techniques  to  the 
geometry  modifications  as  well  as  the  solution  of  the  flow 
and  adjoint  equations.  Pironneau  has  studied  the  use  of 
control  theory  for  optimal  shape  design  of  systems  gov¬ 
erned  by  elliptic  equations  [134],  and  more  recently  the 
Navier-Stokes  equations,  and  also  wave  reflection  prob¬ 
lems.  Adjoint  methods  have  also  been  used  by  Baysal 
and  Eleshaky  [16]. 


6.3  Three-Dimensional  Design  using  the  Euler  Equa¬ 
tions 

In  order  to  illustrate  the  application  of  control  theory  to 
aerodynamic  design  problems,  this  section  treats  the  case 
of  three-dimensional  wing  design  using  the  inviscid  Eu¬ 
ler  equations  as  the  mathematics  model  for  compressible 
flow.  A  transformation  to  a  body-fitted  coordinate  system 
will  be  introduced,  so  that  variations  in  the  wing  shape  in¬ 
duce  corresponding  variations  in  the  computational  mesh. 
Thus  the  flow  is  determined  by  the  solution  of  the  trans¬ 
formed  equation  (5).  Let 


dXj\  ’ 

and 

Q=JK-\ 


Kir 


dXi 


,  J=det(A:) 


1-20 


The  elements  of  Q  are  the  coefficients  of  K,  and  in  a 
finite  volume  discretization  they  are  just  the  face  areas  of 
the  computational  cells  projected  in  the  xi,  X2,  and  X3 
directions.  Also  introduce  scaled  contravariant  velocity 
components 

*  TT  /~\ 

The  transformed  equations  can  now  be  written  as 


dW 

dt 


(48) 


where 

and 


F i—Qijfj~ 


W=Jw 

pUi 

pUiU\  + 

pUiU2  +  .  ___ 
pUiU2  +  Qap 
pUiH 


i\P 

HP 


Assume  now  that  the  new  cpmputational  coordinate  sys¬ 
tem  conforms  to  the  wing  in  such  a  way  that  the  wing 
surface  Bw  is  represented  by  ^2=0-  Then  the  flow  is 
determined  as  the  steady  state  solution  of  equation  (48) 
subject  to  the  flow  tangency  condition 


[/2=0  on  Bw  (49) 


At  the  far  field  boundary  Bp,  conditions  are  specified  for 
incoming  waves,  as  in  the  two-dimensional  case,  while 
outgoing  waves  are  determined  by  the  solution. 

The  weak  form  of  the  Euler  equations  for  steady  flow  can 
be  written  as 


f  ^FidV=  f  UifFidB,  (50) 

Jv  Jb 

where  the  test  vector  (;&  is  an  arbitrary  differentiable  func¬ 
tion  and  Ui  is  the  outward  norm^  at  the  boundary.  If  a 
differentiable  solution  w  is  obtained  to  this  equation,  it 
can  be  integrated  by  parts  to  give 


dFi 

d^i 


dV=0 


and  since  this  is  true  for  any  (f),  the  differential  form  can 
be  recovered.  If  the  solution  is  discontinuous,  equation 
(50)  may  be  integrated  by  parts  separately  on  either  side 
of  tne  discontinuity  to  recover  the  snock  jump  conditions. 

Suppose  now  that  it  is  desired  to  control  the  surface  pres¬ 
sure  by  varying  the  wing  shape.  It  is  convenient  to  retain 
a  fixed  computational  domain.  Variations  in  the  shape 
then  result  m  corresponding  variations  in  the  mapping 
derivatives  defined  by  K.  Introduce  the  cost  function 


(P-Pdfd^id^3, 

where  pd  is  the  desired  pressure.  The  design  problem  is 
now  treated  as  a  control  problem  where  the  control  func¬ 
tion  is  the  wing  shape,  wnich  is  to  be  chosen  to  minimize 
I  subject  to  the  constraints  defined  by  the  flow  equations 
(48-50).  A  variation  in  the  shape  will  cause  a  variation 
6p  in  the  pressure  and  consequently  a  variation  in  the  cost 
function 


(p-Pd)Sp  d^id^i- 


(51) 


Since  p  depends  on  w  through  the  equation  of  state  (2), 
the  variation  5p  can  be  determined  from  the  variation  Sw. 
Define  the  Jacobian  matrices 


(52) 


The  weak  form  of  the  equation  for  dw  in  the  steady  state 
becomes 


L 


^6FidV=  f  imcp'^SFi)  dB, 
dii  Jb 


where 

SFi=CiSw  +  SQijfj, 

which  should  hold  for  any  differential  test  function  (f). 
This  equation  may  be  added  to  the  variation  in  the  cost 
function,  which  may  now  be  written  as 


On  the  wing  surface  Bw,  ni=n3=0  and  it  follows  from 
equation  (49)  that 


0 

0 

Q2\6p 

SQ21P 

Q126P 

+ 

6Q22P 

Q236P 

6Q23P 

0 

0 

(54) 


Since  the  weak  equation  for  5w  should  hold  for  an  arbi¬ 
trary  choice  of  the  test  vector  (f),  we  are  free  to  choose  (j)  to 
simplify  the  resulting  expressions.  Therefore  we  set  ^=1/), 
where  the  costate  vector  V’  is  the  solution  of  the  adjoint 
equation 


At  the  outer  boundary  incoming  characteristics  for  xp  cor¬ 
respond  to  outgoing  characteristics  for  5w.  Consequently 
one  can  choose  boundary  conditions  for  ip  such  that 


niip'^Ci6w=0. 


Then  if  the  coordinate  transformation  is  such  that  60  is 
negligible  in  the  far  field,  the  only  remaining  boundary 
term  IS 


SF2  d^id^j. 


Thus  by  letting  ip  satisfy  the  boundary  condition. 


Q211P2  +  Qziips  +  Q23ipi-(p  —  Pd)  on  Bw,  (56) 


we  find  finally  that 


51  =-  j  ^SQijfjdV 
-//<  '  ' 


(6Q211P2  +  5Q22ip3  +  Q22ip‘,)pd^\d^2.  (57) 
Bw 


A  convenient  way  to  treat  a  wing  is  to  introduce  sheared 
parabolic  coordinates  as  shown  in  figure  18  through  the 
transformation 


X  =  a;o(C)+^fl(0{e'-(j?  +  5(e,C))'} 
y  =  yo(0  +  a(oar)  +  S{^,0) 

z  =  C- 


Ci  ij  Aj . 
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1 8a:  X,  y-Plane.  1 8b:  yy-Plane. 

Figure  18;  Sheared  Parabolic  Mapping. 


Here  x=xi ,  y=X2,  z=Xi  are  the  Cartesian  coordinates,  and 
£  and  j?  +  <S  correspond  to  parabolic  coordinates  generated 
by  the  mapping 


1  2 
X  +  iy=xo  +  12/0+2®  (0  +  *(»?  +  5)} 


Independent  movement  of  the  bound^  mesh  points 
could  produce  discontinuities  in  the  designed  shape.  In 
order  to  prevent  this  the  gradient  may  be  also  smoothed. 
Both  explicit  and  implicit  smoothing  procedures  are  use¬ 
ful.  Suppose  that  the  movement  of  the  surface  mesh  points 
were  denned  by  local  B-splines.  In  the  case  of  a  uniform 
one-dimensional  mesh,  a  B-spline  with  a  displacement  d 
centered  at  the  mesh  point  i  would  produce  displacements 
d/4  at  j  +  1  and  i  —  \  and  zero  elsewhere,  while  preserv¬ 
ing  continuity  of  the  first  and  second  derivatives.  Thus 
we  can  suppose  that  the  discrete  surface  displacement  has 
the  form 

SS=Bd, 

where  B  is  a  matrix  with  coefficients  defined  by  the  B- 
^lines,  and  di  is  the  displacement  associated  with  the 
B-^line  centered  at  i.  Then,  using  the  discrete  formulas, 
to  first  order  the  change  in  the  cost  is 


at  a  fixed  span  station  £.  xq  (C)  yo  (C)  are  the  coor¬ 
dinates  of  a  singular  line  which  is  swept  to  lie  just  inside 
the  leading  edge  of  a  swept  wing,  while  a  (£)  is  a  scale 
factor  to  allow  for  spanwise  chord  variations. 

We  now  treat  <5  (£,  0  as  the  control.  Substitution  of  these 
formulas  yields  the  variation  in  the  form 

51=1  j  Q(trj)  6S(^,v)d^dr) 

where  the  gradient  Q  (£,  rj)  is  obtained  by  evaluating  the 
integrals  in  equation  (57).  Thus  to  reduce  I  we  can  choose 


5S=-\g 


where  A  is  sufficiently  small  and  non-negative.  In  order 
to  impose  a  thickness  constraint  we  can  define  a  baseline 
surface  5o  (£,  C)  below  which  S  (£,  Q  is  not  allowed  to 
fall.  Now  we  take  A=A  (£,  Q  as  a  non-negative  function 
such  that 


5(£,0  +  ‘55(£,0><5o(£,0-  (58) 


6I=g^5S=g^Bd. 


Thus  the  gradient  with  respect  to  the  B-spline  coefficients 
is  obtained  by  multiplying  g  by  B^,  and  a  descent  step  is 
defined  by  setting 

d=-XB'^g,  6S=Bd=-XBB^g 

where  A  is  sufficiently  small  and  positive.  The  coefficients 
of  B  can  be  renormalized  to  produce  unit  row  sums.  With 
a  uniform  mesh  spacing  in  the  computational  domain  this 
formula  is  equivalent  to  the  use  of  a  gradient  modified  by 
two  passes  or  the  explicit  smoothing  procedure 


with  a  similar  smoothing  procedure  in  the  k  discretization. 

Implicit  smoothing  may  also  be  used.  The  smoothing 
equation 


Then  the  constraint  is  satisfied,  while 


^i+5,fc  (^i+l,fe  1  ,fc  gi—l,k^~gi,k 


5I=- 


xg^di  dC  <  0. 


The  costate  solution  ip  is  a.  legitimate  test  function  for 
the  weak  form  of  the  flow  equations  only  if  it  is  differ¬ 
entiable.  Smoothness  should  also  be  preserved  in  the 
redesigned  shape.  It  is  therefore  crucially  important  to 
introduce  appropriate  smoothing  procedures.  In  order 
to  avoid  discontinuities  in  the  adjoint  boundary  condition 
which  would  be  caused  by  the  appearance  of  shock  waves, 
the  cost  function  for  the  target  pressure  may  be  modified 
to  the  form 


/ 


XiZ 


d  ,  dz 


approximates  the  differential  equation 


g- 


=g 


If  one  sets  SS=—Xg,  then  to  first  order  the  change  in  the 
cost  is 

61  =  -jJgSSd^dr] 


<  0, 


Then 

61  =  Jj(^XiZ6Z  +  X2^^Sz'^d^dr] 

ZSpd^dr] 

and  the  smooth  quantity  Z  replaces  p  —  pdin  the  adjoint 
boundary  condition. 


assuring  an  improvement  if  A  is  sufficiently  small  and  pos¬ 
itive,  unless  the  process  has  already  reached  a  stationary 
point  at  which  g=0. 


6.4  Design  of  Swept  Wings  for  Very  Low  Shock  Drag 

The  method  has  been  used  to  carry  out  a  study  of  swept 
wing  designs  which  might  be  appropriate  for  long  range 
transport  aircraft.  Since  three  dimensional  calculations 
requue  substantial  computational  resources,  it  is  ex¬ 
tremely  important  for  the  practical  implementation  of  the 
method  to  use  fast  solution  algorithms  for  the  flow  and  the 
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adjoint  equations.  In  this  case  the  author’s  FL087  com¬ 
puter  program  has  been  used  as  the  basis  of  the  design 
method.  EL087  solves  the  three  dimensional  Euler  equa¬ 
tions  with  a  cell-centered  finite  volume  scheme,  and  uses 
residual  averaging  and  multigrid  acceleration  to  obtain 
very  rapid  steady  state  solutions,  usually  in  25  to  50  multi- 
grid  cycles  [66,  70].  Upwind  biasing  is  used  to  produce 
non-oscillatory  solutions,  and  assure  the  clean  capture  of 
shock  waves.  This  is  introduced  through  the  addition 
of  carefully  controlled  numerical  diffusion  terms,  with  a 
magnitude  of  order  Ax^  in  smooth  parts  of  the  flow.  The 
adjoint  equations  are  treated  in  the  same  way  as  the  flow 
equations.  The  fluxes  are  first  estimated  by  central  differ¬ 
ences,  and  then  modified  by  downwind  biasing  through 
numerical  diffusive  terms  which  are  supplied  by  the  same 
subroutines  that  were  used  for  the  flow  equations. 

The  study  has  been  focussed  on  wings  designed  for  cruis¬ 
ing  at  Mach  .85,  with  lift  coefficients  in  the  range  of  .5  to 
.55.  In  every  case,  the  wing  planform  was  fixed  while  the 
sections  were  free  to  be  changed  arbitrarily  by  the  de^n 
method,  with  a  restriction  on  the  minimum  thickness,  "^e 
initial  wing  has  a  unit-semi-span,  with  38  degrees  leading 
edge  sweep.  It  has  a  modified  tr^ezoidal  planform,  with 
straight  taper  from  a  root  chord  of  0.38,  and  a  curved 
trailing  edge  in  the  inboard  region  blending  into  straight 
taper  outboard  of  the  30  percent  span  station  to  a  tip  chord 
of  0.10,  with  an  aspect  ratio  of  9.0.  The  initial  wing  sec¬ 
tions  were  based  on  a  section  specifically  designed  by 
die  author’s  two  dimensional  design  method  [73]  to  give 
shock  free  flow  at  Mach  0.78  with  a  lift  coefficient  of 
0.6.  This  section,  which  has  a  thickness  to  chord  ratio  of 
9.5  percent,  was  used  at  the  tip.  Similar  sections  with  an 
increased  thictoess  were  used  inboard.  The  variation  of 
thickness  was  non-linear  with  a  more  rapid  increase  near 
the  root,  where  the  thickness  to  chord  ratio  of  the  basic 
section  was  multiplied  by  a  factor  of  1.47.  The  inboard 
sections  were  rotated  upwards  to  give  the  initial  wing  3.5 
degrees  twist  from  root  to  tip.  The  two-dimensional  pres¬ 
sure  distribution  of  the  basic  wing  section  at  its  design 
point  was  introduced  as  a  target  pressure  distribution  uni¬ 
formly  across  the  span.  This  target  is  presumably  not 
realizable,  but  serves  to  favor  the  establishment  of  a  rela¬ 
tively  benign  pressure  distribution.  The  total  inviscid  drag 
coefficient,  due  to  the  combination  of  vortex  and  shock 
wave  drag,  was  also  included  in  the  cost  function.  Since 
the  main  objective  of  the  study  was  to  minimize  the  drag, 
the  target  pressure  distribution  was  reset  after  every  fourth 
design  cycle  to  a  distribution  derived  by  smoothing  the  ex¬ 
isting  pressure  distribution.  This  allows  the  scheme  more 
freedom  to  make  changes  which  reduce  drag.  The  cal¬ 
culations  were  performed  with  the  lift  coefficient  forced 
to  approach  a  fixed  value  ^  adjusting  the  angle  of  attack 
every  fifth  iteration  of  the  flow  solution.  It  was  found  that 
the  computational  costs  can  be  reduced  by  using  only  15 
multigrid  cycles  in  each  flow  solution,  and  in  each  adjoint 
solution.  Although  this  is  not  enough  for  full  conver¬ 
gence,  it  proves  sufficient  to  provide  a  shape  modification 
which  leads  to  an  improvement. 

Figures  27  and  28  show  a  wing  which  was  designed  for  a 
lift  coefficient  of  .50  at  Mach  .85.  In  order  to  prevent  the 
final  wing  from  becoming  too  thin  the  threshold  So  (^ ,  ??) 
was  set  at  three  quarters  of  the  height  of  the  bump 
defining  the  initial  wing.  This  calculation  was  performed 
on  a  mesh  with  192  intervals  in  the  ^  direction  wrapping 
around  the  wing,  32  intervals  in  the  normal  rj  direction 
and  48  intervals  in  the  spanwise  C  direction,  giving  a 
total  of  294912  cells.  The  wing  was  specified  by  33  sec¬ 
tions,  each  with  128  points,  giving  a  total  of  4224  design 
variables.  The  plots  show  the  initial  wing  geometre  and 
pressure  distribution,  and  the  modified  geometry  and  pres¬ 
sure  distribution  after  40  design  cycles.  The  total  inviscid 
drag  coefficient  was  reduced  from  0.02 10  to  0.01 12.  The 
initial  design  exhibits  a  very  strong  shock  wave  in  the 
inboard  region.  It  can  be  seen  that  this  is  completely 
eliminated,  leaving  a  very  weak  shock  wave  in  the  out¬ 
board  region.  To  verify  me  solution,  the  final  geometry 
was  analyzed  with  another  method,  using  the  computer 


program  FL067.  This  program  uses  a  cell-vertex  formu¬ 
lation,  and  has  recently  been  modified  to  incorporate  a 
local  extremum  diminishing  algorithm  with  a  very  low 
level  of  numerical  diffusion  [76].  When  run  to  full  con¬ 
vergence  it  was  found  that  a  better  estimate  of  the  drag 
coefficient  of  the  redesigned  wing  is  0.0094  at  Mach  0.85 
with  a  lift  coefficient  of  0.5,  giving  a  lift  to  drag  ratio 
of  53.  The  results  from  FL067  for  the  initial  and  final 
wings  are  illustrated  in  Figures  29  and  30.  A  calculation 
at  Mach  0.500  shows  a  drag  coefficient  of  0.0087  for  a 
lift  coefficient  of  0.5.  Since  in  this  case  the  flow  is  en¬ 
tirely  subsonic,  this  provides  an  estimate  of  the  vortex 
drag  for  this  planform  and  lift  distribution,  which  is  just 
what  one  obtains  from  the  standard  formula  for  induced 
drag,  Cd=Cl^ le'KAR,  with  an  aspect  ratio  AR=9,  and 
an  efficiency  factor  e=0.97.  Thus  the  design  method  has 
reduced  the  shock  wave  drag  coefficient  to  about  0.0007 
at  a  lift  coefficient  of  0.5.  Figure  31  shows  the  result  of 
an  analysis  for  an  off  design  point  with  the  Mach  number 
increased  to  .86  with  the  same  lift  coefficient  of  .5.  This 
results  in  a  flat-topped  pressure  distribution  terminating 
with  a  weak  shock  of  nearly  uniform  strength  across  the 
whole  ^an.  The  drag  coefficient  is  .0097.  The  penalty 
of  .0003  is  so  small  that  this  might  be  a  preferred  cruising 
condition. 

A  second  wing  was  designed  in  exactly  the  same  manner 
as  the  first,  starting  from  the  same  initial  geometry  and 
with  the  same  constraints,  to  give  a  lift  coefficient  of  .55 
at  Mach  .85.  This  produces  stronger  shock  waves  and  is 
therefore  a  more  severe  test  of  the  method.  In  this  case  the 
total  inviscid  drag  coefficient  was  reduced  from  0.0243  to 
0.01 34  in  40  design  cycles.  Again  the  performance  of  the 
final  design  was  verified  by  a  calculation  with  FL067,  and 
when  the  result  was  fully  converged  the  drag  coefficient 
was  found  to  be  0.01 15.  A  subsonic  calculation  at  Mach 
.500  shows  a  drag  coefficient  of  0.0 1 07  for  a  lift  coefficient 
of  0.55.  Thus  in  this  case  the  shock  wave  drag  coefficient 
is  about  0.0008.  For  a  representative  transport  aircraft  the 
parasite  drag  coefficient  of  the  wing  due  to  skin  friction  is 
about  0.0045.  Also  the  fuselage  drag  coefficient  is  about 
0.0050,  the  nacelle  drag  coefficient  is  about  0.0015,  the 
empennage  drag  coefficient  is  about  0.0020,  and  excres¬ 
cence  drag  coefficient  is  about  0.0010.  This  would  give 
a  total  drag  coefficient  C £>=0.0255  for  a  lift  coefficient 
of  0.55,  corresponding  to  a  lift  to  drag  ratio  L/D=21.6. 
This  would  be  a  substantial  improvement  over  the  values 
obtained  by  currently  flying  transport  aircraft. 


6.5  Optimization  of  Complex  Configurations 

In  order  to  treat  more  complex  configurations  one  can  use 
a  numerical  grid  generation  procedure  to  produce  a  body- 
fitted  mesh  for  the  initial  geometry,  and  then  modify  the 
mesh  in  subsequent  design  cycles  by  an  analytic  perturba¬ 
tion  formula.  In  the  two-dimensional  case,  for  example, 
with  computational  coordinates  t],  let  the  boundary  dis¬ 
placement  at  ri=0  be  dxjCO,  SybiO-  Then  the  mesh 
points  along  the  radial  coordinate  lines  ^=constant  can  be 
replaced  by 

Sx(.^,ri)  =  TZit])  5xb(0 
SybiO 

yielding 


TICtj)  -g^Sxb 

R'iO  -^Syb 


%Sxb 

isyb 


Such  a  procedure  has  been  implemented  by  J.  Reuther  for 
the  three-dimensional  Euler  equations,  and  applied  to  the 
optimization  of  wing-body  configurations  [143]. 

It  is  also  possible  to  show  that  in  the  continuous  limit 
the  field  integral  in  equation  (57)  can  be  eliminated.  Let 
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the  change  in  the  coordinates  Xi  at  fixed  ^  be  SxiC^. 
Then,  using  the  fact  that  the  fluxes  fj  (w)  satisfy  the  flow 
equation  (48),  it  is  possible  to  show  by  a  direct  calculation 

^  An  f -—n 


where 


6^=K-^Sx. 


A  detailed  derivation  is  given  in  reference  [78].  Thus  the 
perturbation  equation  can  be  written  as 


d 

d^i 


{6w  +  Sw* 


=0 


where  dw  is  the  variation  in  the  solution  at  fixed  ^  caused 
by  the  change  in  the  boundary,  while  6w*  is  the  change 
in  the  original  solution  w  (0  corresponding  to  the  mesh 
movement  Sx(^) 


Now 


L  .. 

f  dip^ 

Jv~^ 


riiip'^dFid^B 
dip"^ 


CiCSw  +  6w*) 


and  if  ip  satisfies  the  adjoint  equation  the  entire  field  in¬ 
tegral  is  eliminated,  leaving  only  the  boundary  integral  in 
equation  (57). 

In  an  actual  discretization  the  field  terms  are  not  zero, 
but  this  result  suggests  that  they  should  be  small  if  a  fine 
enough  mesh  is  used,  and  might  be  dropped.  This  al¬ 
lows  a  drastic  simplification  of  the  treatment  of  complex 
configurations.  Prelimin^  numerical  experiments  with 
ahfon  and  wing  calculations  indicate  roughly  the  same 
convergence  with  and  without  the  field  terms  in  the  gra¬ 
dient. 


7.  OUTLOOK  AND  CONCLUSIONS 

Better  algorithms  and  better  computer  hardware  have  con¬ 
tributed  about  equally  to  the  progress  of  computational 
science  in  the  last  two  decades.  In  1970  the  Control  Data 
6600  represented  the  state  of  the  art  in  computer  hard¬ 
ware  with  a  speed  of  about  10*^  operations  per  second 
(one  megaflop),  while  in  1990  the  8  processor  Cray  YMP 
offered  a  performance  of  about  10®  operations  per  sec¬ 
ond  (one  gigaflop).  Correspondingly,  steady-state  Euler 
calculations  which  required  5,000-10,000  steps  prior  to 
1980  could  be  performed  in  10-50  steps  in  1990  using 
multigrid  acceleration.  With  the  advent  of  massively  par¬ 
allel  computers  it  appears  that  the  progress  of  computer 
hardware  may  even  accelerate.  Teraflop  machines  offer¬ 
ing  further  improvement  by  a  factor  of  1,000  are  likely  to 
be  available  within  a  few  years.  Parallel  architectures  will 
force  a  reappraisal  of  existing  algorithms,  and  their  effec¬ 
tive  utilization  will  require  the  extensive  development  of 
new  parallel  software. 

In  parallel  with  the  transition  to  more  sophisticated  algo¬ 
rithms,  the  present  challenge  is  to  extend  the  effective  use 
of  CFD  to  more  complex  applications.  A  key  problem  is 
the  treatment  of  multiple  space  and  time  scales.  These 
arise  not  only  in  turbulent  nows,  but  also  in  many  other 
situations  such  as  chemically  reacting  flows,  combustion, 
flame  fronts  and  plasma  dynamics.  Another  challenge,  is 
presented  by  problems  with  moving  boundaries.  Exam¬ 
ples  include  helicopter  rotors,  and  rotor-stator  interaction 
m  turbomachinery.  Algorithms  for  these  problems  can 


be  significantly  improved  by  innovative  concepts,  such 
as  the  idea  of  time  inclining.  It  can  be  anticipated  that 
interdisciplinary  applications  in  which  CFD  is  coupled 
wi±  the  computational  analysis  of  other  properties  of  the 
design  will  play  an  increasingly  important  role.  These 
applications  may  include  structural,  thermal  and  electro¬ 
magnetic  analysis.  Aeroelastic  problems  and  integrated 
control  ^stem  and  aerodynamic  design  are  likely  target 
areas.  Tne  development  of  improved  algorithms  contin- 
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Figure  19:  Concept  for  a  numerical  wind  tunnel. 


Figure  20:  Advanced  numerical  wind  tunnel. 

ues  to  be  important  in  providing  the  basic  building  blocks 
for  numerical  simulation.  In  particular,  better  error  esti¬ 
mation  procedures  must  be  developed  and  incorporated 
in  the  simulation  software  to  provide  error  control.  The 
basic  simulation  software  is  only  one  of  the  needed  ingre¬ 
dients,  however.  The  flow  solver  must  be  embedded  in  a 
user-friendly  system  for  geometry  modeling,  output  anal¬ 
ysis,  and  data  management  that  will  provide  a  complete 
numerical  design  environment.  These  are  the  ingredients 
which  are  needed  for  the  full  realization  of  the  concept  of 
a  numerical  wind  tunnel.  Figures  19  and  20  illustrate  the 
way  in  which  a  numerical  wind  tunnel  might  evolve  from 
current  techniques,  which  involve  massive  data  handling 
tasks,  to  a  fully  integrated  design  environment. 

In  the  long  run,  computational  simulation  should  become 
the  principal  tool  of  the  aerodynamic  design  process  be¬ 
cause  of  tne  flexibility  it  provides  for  the  rapid  and  com¬ 
paratively  inexpensive  evaluation  of  alternative  designs, 
and  because  it  can  be  integrated  in  a  numerical  design  en¬ 
vironment  providing  for  both  multi-disciplinary  analysis 
and  multi-disciplinary  optimization. 
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Figure  21:  Navier-Stokes  Predictions  for  the  F-18  Wing-Fuselage  at  Large  Incidence 


22a:  RAE-2822  Airfoil 


22b:  NACA-001 2  Airfoil 


Figure  22:  0-Topology  Meshes,  160x32 


23a:  Cp  after  25  Cycles. 
C',=1.1312,  C7d=0.0469. 


Figure  23:  RAE-2822  Airfoil  at  Mach  0.750  and  a=3.0°H-CUSP  Scheme. 


24a:  Cp  after  35  Cycles.  24b:  Convergence. 

C, =0.3654,  C'd=0.0232. 

Figure  24:  NACA-0012  Airfoil  at  Mach  0.800  and  a=1.25°H-CUSP  Scheme. 


25a:  Cp  after  35  Cycles. 
Q=0.3861,  Cd=0.0582. 


25b:  Convergence. 


Figure  25:  NACA-0012  Airfoil  at  Mach  0.850  and  a=1.0°H-CUSP  Scheme. 
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26a:  12.50%  Span.  26b:  31.25%  Span. 

^=0.2933,  Cd=0.0274.  CpO.3139,  Cd=0.0159. 


26c:  50.00%  Span.  26d:  68.75%  Span. 

C'z=0.3262,  Cd=0.0089.  ^=0.3 195,  (7^=0.0026. 

Figure  26:  Onera  M6  Wing.  Mach  0.840,  Angle  of  Attack  3.06'’,  192x32x48  Mesh.  C'i,=0.3041,  C'£)=0.0131. 
H-CUSP  scheme. 


27a:  Initial  Wing  27b:  40  Design  Iterations 

=0.5001,  C£,=0.0210,  a=- 1.672°  Cz,  =0.5000,  C'£,=0.0112,  a=-0. 

Figure  27:  Swept  Wing  Design  Case  (1),  M=0.85,  Fixed  Lift  Mode.Drag  Reduction  at  C'i,=.5. 
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29a:  span  station  2=0.00 


S 


5 


s 


29b:  span  station  2=0.312 


29c:  span  station  2=0.625  29d:  span  station  2=0.937 


Figure  29:  FL067  solution  for  initial  wing.M=0.85,  C'l=0.4997,  C'd=0.0207,  q:=—  1.970°. 
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30a:  span  station  2:=0.00  30b:  span  station  ^=0.3 12 


§  -  1  - 

B  -  §  - 


30c:  span  station  2:=0.625  30cl:  span  station  z=0.937 


Figure  30:  FL067  check  on  redesigned  wing.M=0.85,  (7^=0.4992,  C'£)=0.0094,  a=— 0.300°. 
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31a:  span  station  z=0.00 


■'■*++++++++++  ^  +  +  + 


31b:  span  station  z=0.312 


31c:  span  station  ^=0.625 
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^  1 


31  d:  span  station  z=0.937 


Figure  3 1 :  FL067  check  on  redesigned  wing  at  a  higher  Mach  number.  M=0.86,  C'i=0.4988,  Cd=0.0097,  a=— 0.440°. 
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SUMMARY 

Changes  are  taking  place  in  the  world  of 
CFD  that  extend  l^yond  the  technical.  They 
include  change  to  the  "research  engine,"  the 
system  infrastructure  that  powers  CFD 
research,  as  it  seeks  to  adapt  to  the  new 
industrial  paradigm  that  is  sweeping  the 
aeronautics  industry,  and  the  world.  The 
"research  engine"  involves  government, 
academia,  and  industry.  Because  it  is  a 
system,  all  parts  of  it  must  participate  in 
change.  None  of  the  parts  can  exist  in 
isolation. 

This  paper  analyzes  the  workings  of  the 
research  engine  and  finds  that  it  is 
encountering  considerable  strain.  Resources 
for  all  elements  of  research  are  below 
historic  levels.  "Money  givers"  are  faced 
with  a  lack  of  metrics  and  infrastructure  for 
telling  them  how  to  invest  their  resources 
except  in  high  level  terms.  Leaders  of 
research  are  having  to  redefine  their  jobs. 
Researchers  are  hunkered  down  to  wait  it 
out.  And  value  systems  are  in  disarray  and 
conflict.  The  adaptation  of  the  research 
engine  to  the  changing  world  is  far  from 
complete.  It  is  in  transition. 

The  paper  goes  on  to  describe  what  the 
author  believes  to  be  the  principal 
characteristics  and  attributes  of  a  well¬ 
functioning  research  engine,  together  with  a 
few  personal  experiences  that  shed  some 
light  on  how  those  attributes  can  be 
achieved.  He  concludes  that  further 
adaptation  of  the  research  engine  will  be 
paced  by  two  key  factors.  One  is  the  need  to 
change  the  types  of  communication  that  take 
place  between  the  research  community  and 


the  engineering  community  in  industry.  The 
other  is  the  need  to  unshackle  the  minds  of 
researchers  from  the  imprisonment  of  an 
overly  narrow  value  system,  a  task  which 
must  be  led  by  the  money  givers  who  inhabit 
the  research  engine. 


INTRODUCTION 

I  find  it  interesting  to  contemplate  those 
topics  which  are  likely  to  be  the  pacing 
items  and  new  challenges  in  CFD. 
Traditionally,  such  an  endeavor  would  focus 
on  the  technology  issues  associated  with 
CFD;  things  like  algorithmic  developments, 
hardware  architectures,  and  so  forth. 
Concerning  the  latter,  Pradeep  Raj  has 
recently  presented  an  up-to-date  review 
(ref.  1)  of  the  issues  and  pacing  items  in 
CFD  technology.  It  addresses  both  the 
functional  characteristics  and  the  operational 
requirements  that  tomorrow's  CFD  codes 
must  have  in  order  to  be  effective,  and  it 
speaks  for  the  U.S.  aeronautical  industry. 
Also,  Professor  Antony  Jameson,  the  first 
keynote  speaker,  will  share  with  us  his 
vision  of  the  technical  challenges  and  future 
developments  in  CFD.  There  is  little  that  I 
could  add  to  their  remarks.  Therefore,  I  will 
focus  my  remarks  on  challenges  and  pacing 
items  that  extend  beyond  the  technicd. 

We  live  at  an  interesting  time.  Our  world  is 
immersed  in  a  period  of  large  and  rapid 
change.  It  is  moving  away  from  a  dogmatic 
belief  and  reliance  upon  technology 
innovation  as  being  the  most  significant 
element  of  competitiveness.  That  is  being 
replaced  by  a  new  paradigm,  one  that  is 
centered  about  customer  satisfaction,  quality 
and  value  as  key  goals. 
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We  in  industry  are  pursuing  those  goals  by 
focusing  on  processes  (ref.  2).  We  now 
understand  that  the  key  to  developing  better 
airplanes  is  to  analyze,  understand,  and 
improve  the  processes  by  which  airplanes 
are  created.  Similarly,  the  key  to  developing 
better  CFD  is  to  analyze,  understand,  and 
improve  the  processes  by  which  CFD  is 
created.  We  also  now  understand  that  the 
leading  principle  of  good  processes  is 
customer  focus  and  customer  satisfaction. 
That  principle  applies  equally  to  the 
processes  that  prepuce  airplanes  and  the 
processes  that  produce  CIT). 

And  so  it  seems  to  me  that  the  most 
significant  pacing  item  in  the  world  of  CFD 
is  the  need  to  analyze,  understand  and 
improve  the  process  by  which  CFD 
capabilities  are  created.  I  call  that  process 
the  research  engine.  There  is  more  leverage 
in  fixing  up  the  research  engine  and  adapting 
it  to  the  changes  in  the  world  than  in 
anything  else  I  can  think  of.  And  so  that  is 
what  I  am  going  to  talk  about. 


The  "Old"  Research  Engine  and  How  it 
Worked 

The  research  engine  as  we  know  it  today 
involves  industry,  academia,  and 
government.  Those  three  components 
interact  with  each  other  as  a  system.  And 
like  most  systems,  one  component  cannot  be 
changed  without  affecting  the  others.  It 
doesn't  work  for  industry  to  change  and  the 
others  not  to  change.  We  are  all  in  this 
together. 

The  need  to  change  pervades  the  entire 
research  infrastmeture.  It  involves 
information  systems  and  the  methods  by 
which  we  communicate,  including  the  holy 
grails  of  technical  societies,  publications, 
and  technical  conferences.  It  involves  the 
changing  of  value  systems,  which  is  almost 
a  cultural  characteristic.  And  reward 
systems.  Changing  is  not  easy. 

I  would  like  to  begin  by  examining  how  the 
research  engine  functioned  in  the  era  that  we 
are  leaving  behind.  Figure  1  (see  Page  2-3) 
presents  a  description  of  the  fundamental 


factors  and  forces  that  powered  the  research 
engine.  This  description  appears  to  be 
universal.  It  looks  the  same,  no  matter 
whether  you  reside  in  industry,  academia,  or 
in  a  government  laboratory.  It  works  the 
same  way.  Only  the  names  of  the  players 
may  differ. 

Key  players  are  the  money  givers.  Then- 
role  is  to  divide  up  money  into  various  large 
buckets,  each  directed  at  a  particular 
category  of  research,  and  to  distribute  it. 

We  all  know  who  those  people  are.  They 
are  the  ones  to  whom  we  write  research 
proposals.  Money  givers  can  be  found  in 
NASA,  in  the  National  Science  Foundation, 
in  the  Department  of  Defense,  and  in  similar 
institutions  in  Europe.  They  also  are  present 
in  industry. 

Most  money  givers  are  not  close  to  the  real 
details  of  airplane  design,  or  to  the  detailed 
processes  that  use  CFD  as  a  tool.  They 
operate  at  a  higher,  more  strategic  level.  But 
they  still  need  criteria  by  which  to  decide 
how  to  divide  up  the  money.  It  is  instructive 
to  take  a  look  at  what  some  of  those  criteria 
were. 

One  such  criteria  was  to  divide  up  the 
money  based  on  historical  precedent.  That 
was,  and  still  is,  practiced  far  and  wide.  It  is 
a  symptom  of  zero  accountability  and  zero 
ability  to  discern  what  is  important. 

Money  givers  are  also  susceptible  to  being 
influenced  by  the  visionary  utterances  of  the 
people  who  inhabit  the  lower  left  box  of 
figure  1,  the  research  leaders.  Research 
leaders  are  in  the  business  of  creating  and 
marketing  visions  of  how  to  make  the  world 
better.  Many  of  them  have  become  very 
good  at  creating  visions  for  research  that 
will  be  looked  favorably  upon  by  the  money 
givers.  They  treat  the  money  giver  as  the 
customer.  One  result  of  that,  of  course,  is 
that  the  research  funding  decisions  that  get 
made  can  be  quite  unrelated  to  the  true 
needs  of  the  people  who  design  airplanes  for 
a  living. 

Money  givers  also  are  desirous  of  evaluating 
the  caliber  of  the  researchers  to  whom  they 
will  give  money.  It  is  rarely  possible  to 
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Money  Givers 

•  Divide  funding  into  broad 

•  Prestige  among  peers 
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•  Number  of  refereed  papers 

•  Perturbations  on 

•  Is  it  novel? 

historical  levels 

I  Back  to  the 

•  Number  of  codes 

•  Weak  accountability 

A  well  for  next 

deposited  on  airplane 

•  Susceptible  to  “gee 

T  year’s  funding 

companies 

whiz”  visions 

•  Whatever  is  perceived  to 

•  Hope  that  good  things 

influence  the  money  givers 

will  happen 

•  Amount  of  research 
monies  captured 

•  Size  of  your  empire 

Dollars  A  publication 


Research  Leaders 

Researchers 

•  Dream  up  “gee  whiz” 

•  Turn  ideas  into  primitive 

visions  that  capture 

code 

funding 

•  No  standards 

•  Make  decisions  on  the 

•  A  “gee  whiz”  demo 

detailed  content  of  the 

Research  plan 

nursed  to  semi- 

annual  research  plan 

convergence 

•  Write,  present  and  publish 
a  paper 

Figure  1.  The  Engine  That  Powered  CFD  Research 


point  to  a  feature  on  an  airplane  and  say 
"this  research  contributed  to  — So,  it  was 
necessary  to  establish  other  measures  in 
order  to  create  a  value  system  which  could 
be  applied  to  individuals. 

One  popular  measure  was  to  look  at  the 
prestige  bestowed  upon  a  researcher,  not  by 
his  customers,  but  by  his  peers,  the  other 
researchers.  Can  you  imagine  what 
automobiles  would  be  like  if  the  criteria  for 
designing  them  was  to  please  the  other 
designers,  rather  than  the  people  who  want 
to  use  cars  to  drive  about  in? 

Another  popular  measure  has  been  to  count 
the  number  of  refereed  papers  that  are 
produced  by  a  researcher.  One  consequence 
of  this  is  that  our  journals  and  conferences 
have  become  littered  with  papers  whose  real 
contribution  is  low  or  nonexistent.  The 
journals  have  evolved  into  being  primarily  a 
scorekeeping  system.  Scientific  information 


today  travels  largely  by  other  means. 
Another  consequence  of  the  numbers  game 
is  that  it  encourages  researchers  to  attack 
problems  that  they  know  how  to  solve  rather 
than  the  problems  that  need  to  be  solved. 
And  so  our  entire  research  infrastmcture  was 
caught  up  in  a  value  system  that  was  largely 
unrelated  to  what  was  important  to  the 
engineers  who  design  and  build  airplanes  for 
a  living.  What  counted  was  paying  homage 
to  a  value  system  that  controlled  access  to 
the  annual  pot  of  money  necessary  to 
support  the  research  leader  and  his/her  staff. 

A  standard  part  of  the  job  of  being  a 
research  group  leader  was  also  to  make  all 
of  the  important  decisions  concerning  the 
detailed  content  of  the  annual  research  plan. 
After  all,  since  research  leaders  are  normally 
exposed  to  new  and  emerging  technology 
that  a  design  engineer  is  not,  it  was  quite 
obvious  to  research  leaders  that  they,  and 
not  design  engineers,  should  be  in  charge  of 


2-4 


defining  the  annual  research  plan.  And  so 
the  design  engineering  community  was 
excluded  from  participating. 

The  researchers  themselves  focused  their 
work  on  paying  homage  to  the  value  system, 
because  that  is  what  entitled  them  to  go  back 
to  the  well  for  next  year's  funding,  and  to 
become  eminent  in  the  eyes  of  their  peers. 

So  there  it  is.  A  stable,  self-sustaining 
research  engine  that  was  capable  of 
functioning  quite  smoothly,  all  by  itself  in 
its  own  little  world.  It  did  so  for  many 
years.  Its  weakness,  of  course,  is  that  it  had 
been  almost  disconnected  from  the 
community  of  people  who  we  now 
understand  to  the  customers  of  CFD 
research,  namely  the  practicing  engineers 
who  design  airplanes  for  a  living. 

Figure  2  (see  Page  2-5)  exhibits  the 
interfaces,  such  as  they  were,  that  existed 
between  the  research  engine  and  the 
aeronautical  industry.  One  such  interface 
involved  the  money  givers,  who  were  visited 
periodically  by  clouds  of  collective  wisdom 
passing  overhead.  Those  clouds  appeared  in 
the  form  of  high  level  advisory  committees, 
wishes  of  the  U.S.  Congress,  or  of  industry 
executives,  depending  upon  where  the 
money  giver  happened  to  reside.  It  is  not 
entirely  coincidental  that  these  clouds  are 
shown  to  be  comprised  of  the  condensation 
of  hot  air  rising  from  airplane  companies.  In 
any  event,  the  resulting  fallout  from  these 
clouds  caused  the  money  givers  to 
occasionally  re-balance  their  research 
portfolios. 

The  other  interface  lay  between  the 
researchers  and  the  practicing  engineers  who 
reside  in  airplane  companies.  This  interface 
is  characterized  by  the  fence  in  figure  2. 
Interestingly,  the  site  of  the  fence  was  not 
always  in  front  of  the  door  of  the  airplane 
company.  It  frequently  could  be  found 
inside  the  airplane  company,  standing 
between  the  internal  company  research 
department  and  the  practicing  engineers  who 
designed  the  airplanes.  In  those  cases,  the 
company  research  departments  paid  most 
allegiance  to  the  research  engine  and  acted 
as  an  integral  part  of  it,  particularly  if  they 


were  dependent  upon  outside  contract 
funding  as  a  source  of  research  money. 

Communication  over  the  fence  was  mostly 
one  way.  It  consisted  primarily  of  attempts 
by  researchers  to  interest  the  engineering 
community  in  the  products  of  their  research. 
The  system  coined  a  name  for  this,  calling  it 
"technology  transfer." 

The  favored  means  of  lofting  the  results  of 
CFD  research  over  the  fence  was  to  send  it 
across  on  the  wings  of  a  scientific 
publication.  The  publication  was  the 
messenger  that  told  of  its  charms  and 
attributes.  And  to  make  sure  that  at  least 
some  folks  in  the  airplane  company  would 
see  it,  the  researcher  empowered  his  delivery 
system  to  honk,  to  attract  attention.  Such 
honking  is  frequently  heard  at  technical 
conferences  and  symposia.  In  fact,  that 
seems  to  have  become  the  prime  motivation 
for  conference  attendance.  Overlooked  was 
the  fact  that  airplane  design  engineers  rarely 
attended  those  conferences. 

The  boards  of  the  fence  have  names 
inscribed  upon  them,  entitled  "conferences," 
"journals,"  "perceptions,"  "value  systems," 
"reward  systems,"  etc..  Those  pillars  of 
tradition  and  conventionality  are  turning  out 
to  be  among  the  factors  that  impede  our 
ability  to  create  a  research  engine  that  is 
more  properly  connected  to  the  customer. 

It  was  a  very  eye-opening  experience  to  us 
in  the  United  States  when  NASA  instituted 
some  dramatic  changes  in  communication. 
They  changed  the  format  of  some  of  their 
conferences  from  one  wherein  the 
researchers  did  all  of  the  honking  to  one  in 
which  industry  did  most  of  the  talking  and 
researchers  did  most  of  the  listening.  Lo  and 
behold,  it  was  discovered  that  the  research 
community  was  not  in  fact  immune  to 
learning  about  what  was  important.  We 
found  that  they  could  even  learn  from  people 
who  didn't  have  PhDs  and  a  lengthy  record 
of  refereed  publications.  The  power  of  two- 
way  communication  began  to  be  unlocked! 

So,  somewhere  along  this  journey  of  change 
we  must  abandon  or  at  least  supplement  our 
old,  one-way  habits  of  communication  as 
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Figure  2.  Interfaces  With  the  Airplane  Industry 


instimtionalized  by  the  conventional 
scientific  establishment.  We  must  replace 
them  with  forms  of  communication  that  do 
the  job  that  needs  to  be  done.  We  are  faced 
with  the  challenge  of  tearing  down  a  fence 
whose  pillars  appear  to  be  set  in  solid 
concrete. 


The  Status  of  Change 

The  process  of  changing  has  begun.  Various 
people  and  organizations  in  all  parts  of  the 
research  engine  are  experimenting  with  new 
and  different  ways  of  operating.  We  are 
searching  for  a  more  effective  research 
engine,  but  we  have  not  found  it  yet. 

When  I  look  around,  I  see  an  increased  level 
of  tension  throughout  the  infrastructure. 
Many  researchers  feel  pressured  to  become 
more  "applied."  NASA  is  being  battered 
from  many  sides,  with  some  voices  calling 
for  them  to  get  back  to  basic  research  while 
others  are  calling  for  them  to  increase  their 


relevance  to  industry's  needs.  Academia  is 
struggling  to  play  a  part,  while  finding  a  role 
for  the  individual  graduate  student  and  the 
educational  mission.  There  are  many 
conflicting  forces  at  work.  It  is  a  difficult 
problem  to  even  think  about,  much  less 
resolve. 

But  we  have  changed.  Figure  3  (see  Page  2- 
6)  presents  my  view  of  the  current  state  of 
affairs  in  the  United  States.  In  the  right 
hand  part  of  the  figure  one  can  observe  a 
new  player  appearing  in  industry.  At 
Boeing  we  caU  these  people  "process 
owners,"  but  that  is  not  a  univenally  used 
name.  What  is  universal  is  the  realization 
within  airplane  companies  that  processes  are 
really  important  and  that  somebody  must 
therefore  be  in  charge  of  them. 

And  so,  these  "process  owners"  represent  a 
new  connection  to  the  research  engine. 

They  have  created  a  gap  in  the  fence  through 
which  their  voices  are  being  increasingly 
heard.  Money  givers,  research  leaders,  and 
researchers  ahke  in  government,  academia. 


Figure  3.  The  Research  Engine  in  Transition 


and  industry  are  being  exposed  increasingly 
to  their  input.  They  are  postured  to  evolve 
eventually  into  a  strong  component  of  the 
overall  research  engine. 

The  life  of  a  money  giver  has  become  more 
challenging.  They  now  all  subscribe  to  the 
new  vision  of  investing  in  things  that  reduce 
cycle  time,  cost,  and  so  forth.  But  they 
mostly  lack  an  infrastructure  of  established 
metht^s  and  metrics  to  guide  them.  They 
are  inventing  and  innovating  as  they  go. 

The  life  of  a  research  leader  is  also 
changing.  The  more  progressive  ones  view 
their  new  role  to  be  to  define  and  manage 
the  process  of  developing  the  annual 
research  plan,  rather  than  personally  making 
the  planning  decisions.  The  new  style  of 
operating  that  I  most  frequently  encounter  is 
for  the  research  leader  and  researchers  to 
simply  ask  the  process  owners  what  they 
want  and  need,  and  then  to  set  about 
implementing  it.  That  is  not  leading  to 
many  of  the  attributes  that  we  desire  in  the 
research  engine. 


The  individual  researchers  are  impacted  by 
this  evolving  research  engine  and  its 
changing  power  stmcture.  They  feel 
buffeted  from  several  directions,  not  the 
least  of  which  is  a  value  system  that  is 
crumbling  and  in  disarray.  Their  attimde  is 
"don't  make  waves  and  do  what  people  in 
power  say  they  want  done."  They  are  not 
particularly  happy. 

And  so  we  have  not  yet  arrived  at  a  properly 
functioning  research  engine.  I  don't  have  a 
complete  vision  of  what  that  research  engine 
would  look  like,  but  I  do  know  many  of  the 
attributes  that  it  should  have.  Those 
attributes  include: 

•  a  lead  role  in  supporting  the  strategic 
direction  set  by  the  industrial 
enterprise. 

•  a  proper  balance  between  basic  and 
applied  research  across  the  R&D 
food  chain. 

•  a  recognition  of  the  importance  of 
vision  building  within  the  research 
process. 


2-7 


•  ability  to  draw  upon  all  of  our 
intellectual  resources,  both  in 
developing  the  research  plan  and  in 
executing  it. 

•  nimbleness  in  translating  the  output 
of  research  into  products  and 
processes. 

•  a  value  system  that  causes  more  of 
the  "right”  things  to  occur. 

•  and  perhaps  most  important  of  all,  a 
value  system  that  supplies  high 
levels  of  human  motivation  and 
sense  of  worth,  one  that  leads  people 
from  within  to  do  more  of  the  right 
things,  and  to  make  it  fun  once  again 
to  be  a  researcher. 

Even  though  my  vision  of  how  to 
accomplish  all  of  that  is  yet  incomplete,  I 
find  within  myself  a  growing  conviction 
about  some  of  the  things  that  tomorrow’s 
research  engine  must  contain.  One  of  those 
things  is  a  tetter  understanding  of  the  proper 
distribution  of  roles,  responsibilities,  and 
core  competencies  that  should  prevail  across 
the  R&D  food  chain  comprising  industry, 
academia,  and  government.  What  that 
distribution  should  be  can  be  derived  by 
testing  it  against  the  axioms  that  accompany 
the  new  industrial  paradigm,  an  exercise  that 
certain  segments  of  the  research 
establishment  find  to  be  somewhat 
threatening.  One  outcome  of  that  testing  is 
the  finding  that  a  best  and  proper  role  for 
academia,  and  for  much  of  NASA,  is  to 
concentrate  on  the  foundational, 
overarching,  enabling  technology  research 
which  comprises  the  head  of  the  R&D  food 
chain. 

Another  of  my  convictions  is  that  we  must 
find  a  much  better  way  of  connecting  the  top 
and  the  bottom  of  the  R&D  food  chain.  This 
is  something  that  we  as  a  country  have  not 
yet  learned  to  do  well  at  all.  And  yet  the 
issues  involved  are  central  to  achieving  a 
research  engine  that  contains  the  attributes 
that  we  desire. 

Connecting  the  two  ends  of  the  food  chain  is 
an  issue  in  communication.  We  have  to 
develop  an  understanding  of  what  needs  to 


be  communicated,  and  when.  And  then  we 
have  to  institute  mechanisms  to  make  it 
happen. 

I  have  been  fortunate  enough  to  have 
enjoyed  the  privilege  of  mnning  a  research 
operation  that  encompassed  the  entire  span 
of  a  research  food  chain,  from  foundational, 
enabling  algorithm  technology,  and  fluid 
mechanics,  to  production  software,  and 
customer  support.  In  that  position,  I  was 
able  to  experiment,  so  I  learned  about  some 
things  that  don't  work  and  other  things  that 
do  work  in  properly  connecting  the  two  ends 
of  the  R&D  food  chain. 

One  thing  that  doesn't  work  well  at  all  is  to 
have  research  leaders  at  the  head  of  the  food 
chain  simply  ask  the  folks  at  the  bottom  of 
the  food  chain  what  they  want  or  need,  and 
then  to  blindly  carry  out  their  wishes.  That 
leads  mostly  to  short-term,  evolutionary 
improvements  of  limited  vision.  It  leads  to 
tactical  research  rather  than  the  strategic 
research  which  belongs  at  the  head  of  the 
food  chain,  and  it  places  the  researchers  in 
the  position  of  "the  boiler  room"  staff.  They 
have  much  more  to  offer  than  that.  Many 
people  have  yet  to  learn  the  true  meaning  of 
the  words  "customer  focus"  that  have 
entered  our  language. 

What  does  work,  not  only  well  but 
incredibly  well  in  connecting  the  two  ends 
of  the  R&D  food  chain,  is  to  do  the 
following  four  things; 

1.  Eliminate  the  constraints  imposed 
in  a  researcher's  mind  by  the  value 
system  under  which  he/she  was 
educated.  Make  it  O.K.  to  do 
things  that  are  outside  of  the  limits 
imposed  by  an  overly  narrow  value 
system.  Create  a  mind-set  and  a 
curiosity  within  the  researcher  to 
wander  freely  up  and  down  the 
R&D  food  chain  and  even  into 
manufacturing. 

This  is  easier  said  than  done.  But  it  can  be 
done.  I’ve  done  it!  It  has  to  be  done, 
because,  more  than  anything  else,  it  is  the 
key  that  unlocks  the  power  and  the 
potential  of  the  highly  educated,  highly 
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paid  research  people  whose  minds  have 
been  refined  and  proven  by  our  rigorously 
competitive  academic  system  up  to  the  PhD 
and  post  doctoral  levels.  The  organization 
or  the  nation  that  does  this  earliest  and  best 
will  have  a  very  significant  competitive 
advantage. 

2.  Expose  and  educate  the  researchers 
who  inhabit  the  head  of  the  food 
chain  in  the  high  level  strategic 
thinking  that  supports  the 
enterprise  in  which  their  customers 
are  engaged.  We  don't  do  that  very 
well  today,  and  yet  this  is  the 
element  that  enables  researchers  to 
identify  and  prioritize  head-of-the- 
food-chain  research  topics  in 
accordance  with  their  strategic 
leverage.  It  must  be  realized  that 
strategically  relevant  research 
should  be  the  primary 
responsibility  of  the  head  of  the 
research  food  chain.  The  lower 
levels  of  the  chain,  where  most 
process  owners  reside,  are  focused 
on  tactical  implementations. 

3.  Expose  the  researcher  to  the  real 
world  of  the  aerospace  engineer. 
This  only  works  well  when  carried 
out  at  the  engineering  site.  Let  the 
researcher  "look  over  the  shoulder" 
of  the  engineering  community  or 
the  process  owner  as  they 
encounter  their  daily  challenges. 
Let  the  researcher  build  personal 
relationships  with  real  engineers. 
What  works  even  better  is  to 
expose  a  team  of  researchers 
encompassing  a  complementary  set 
of  differing  skills  and  strengths, 
because  you  will  then  be  deploying 
a  more  complete  set  of  intellectual 
assets.  The  imperative  is  to 
"enable  the  researcher  to  look 
beyond  what  the  customer 
says  he! she  needs,  and  to 
formulate  a  vision  of  what 
he! she  could  provide  that 
would  really  be  useful  to  the 
customer  and  his/her 


environment.  ”  This  is  not 
accomplished  by  talking  mostly 
with  the  management,  which  has 
been  our  past  practice.  It  is  the 
direct  exposure  to  the  daily  issues 
faced  by  the  engineering  design 
process  that  really  turns  on  the 
creation  juices.  It  is  what  enables 
the  parable  which  says  "necessity 
is  the  mother  of  invention"  to 
operate! 

These  three  steps,  tearing  down  the 
imprisoning  w^ls  of  the  value  system, 
exposing  the  researcher  to  the  strategic 
thinking,  and  exposing  him/her  to  the 
engineering  world  so  as  to  enable  the 
researcher  to  look  beyond  what  the  customer 
says  he  needs  is  the  one  means  that  I  have 
found  to  be  consistently  successful  in 
creating  ideas  for  research  that  have  high 
relevancy  and  which  are  supercharged  by 
bringing  to  bear  the  latest  and  greatest  in 
enabling  technology  while  bathed  in  the 
light  of  high  level  strategic  thinking.  This  is 
what  we  must  strive  for  in  our  research 
engine. 

The  reason  that  one  must  proceed  to  a  fourth 
step  is  that,  at  this  stage,  the  customer  will 
generally  not  agree  with  or  approve  of  the 
ideas  and  plans  that  the  researchers  have 
formed  as  a  result  of  the  first  three  steps. 

Not  yet! 

The  reasons  are  several.  One  is  that  the 
differing  educational  background  of  an 
engineer  frequently  makes  it  impossible  for 
him  to  understand  the  approach  being 
proposed  by  the  researcher,  or  to  assess  the 
risk  involved  in  turning  ideas  into  reality. 
And  engineers  who  are  immersed  in  hot 
projects  are  much  more  focused  on  the 
answer  they  need  tomorrow  than  in  the 
strategic  directions  of  interest  at  a  higher 
corporate  level.  They  tend  to  be  tactical 
thinkers.  But  support  and  enablement  of 
high  level  strategic  direction  is  what  head- 
of-the-food-chain  research  is  all  about! 

And  so  a  means  must  be  found  to  allow  the 
researchers  to  proceed  with  development  of 
their  ideas  in  the  face  of  customer 
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opposition.  This  requires  an  act  of  courage 
on  the  part  of  the  research  leader  and  the 
money  givers.  But  in  my  experience  it 
rarely  fails  to  produce  handsome  dividends. 
The  only  problem,  if  it  should  even  be  called 
a  problem,  is  that  at  this  stage  nobody  yet 
knows  in  exactly  what  form  that  dividend 
will  be  experienced.  That  appears  only  in 
step  four. 

Step  4  is  what  I  call  "vision  building."  This 
is  the  key  activity  that  converts  push  to  pull 
in  the  R&D  food  chain.  The  primary  cause 
of  failed  research  —  and  I  define  failed  to 
mean  research  that  doesn't  get  picked  up  and 
used  by  anybody  —  is  that  the  vision  from 
the  head  of  the  food  chain  that  propelled  the 
research,  and  the  vision  from  the  bottom  of 
the  food  chain  about  what  those  folks  think 
is  useful,  have  no  common  intersection.  If 
those  two  visions,  originating  from  opposite 
ends  of  the  food  chain,  cannot  be  made  to 
intersect,  the  research  will  not  be  accepted. 

It  will  be  ignored  by  the  people  who  call  the 
shots  in  determining  what  CFD  gets  used  in 
the  design  of  airplanes. 

And  so,  a  key  element  in  the  successful 
operation  of  an  R&D  food  chain  is  the 
process  that  I  call  "vision  building,"  a 
process  for  bringing  together  the  separate 
visions  that  originate  at  the  two  ends  of  the 
R&D  food  chain.  What  does  it  take? 
Throwing  publications  or  codes  over  the 
fence,  which  is  the  traditional  approach  to 
vision  building,  doesn't  work  well  at  all. 
Presenting  "gee  whiz"  papers  at  conferences 
doesn't  work.  Arguing  back  and  forth 
doesn't  work  well  either.  Neither  does 
voting.  I've  tried  them  all. 

What  works  is  for  the  research  community 
to  produce  something  that  an  engineer  can 
"touch  and  feel,"  usually  a  CFD  code 
capable  of  performing  a  small  number  of 
computations  that  illustrate  what  can  be 
done.  This  is  not  the  time  or  the  place  for 
well-documented  code,  user  friendly  input 
formats,  or  polished  and  orderly  software. 
Rather,  the  researcher  at  this  point  is 
engaged  in  a  race  to  discovery  and 
understanding  before  his  fragile  support 
system  runs  out  of  patience.  Shortcuts  are 
acceptable  and  encouraged,  with  one 
exception.  That  exception  is  execution 
efficiency.  This  is  one  of  the  key  measures 


that  will  be  "touched  and  felt"  and  usually 
should  not  be  comprised. 

Some  people  (managers  and  software 
specialists  in  particular)  will  be  troubled 
with  the  idea  of  producing  code  that  is 
undocumented,  which  probably  does  not 
adhere  to  standards,  and  which  contains 
shortcuts.  That  is  because  they  interpret  the 
code  to  be  the  product.  They  fail  to  realize 
that  the  primary  product  of  research  at  this 
stage  is  vision,  not  code! 

The  best  way  I  found  to  build  vision  was  for 
the  researchers  to  again  return  to  the 
customer  site.  They  would  identify  real 
design  problems  being  faced  by  the  design 
engineers  and  they  would  set  up  and  run 
demonstrations  of  their  new  CITO 
technology  on  those  problems.  This  led  to 
side-by-side  comparisons  of  new  versus  old 
ways  of  doing  things.  It  frequently  did  not 
contribute  much  at  that  point  to  the 
engineering  project's  near  term  design  goals 
because  the  code  was  still  developmental, 
fragile,  hard  to  use,  perhaps  containing  a  few 
bugs,  and  not  yet  trustworthy. 

What  it  did  do,  and  do  well,  was  to  build 
vision  within  the  minds  of  design  engineers. 
A  typical  reaction  to  a  set  of  these 
calculations  would  be  "so  that  is  what  you 
can  do!  Well,  if  you  add  this  and  that,  I  can 

use  it  for - ."  That  is  vision  building!  At 

that  point  the  engineer  becomes  an  advocate 
of  the  research.  This  is  when  "push” 
changes  to  "pull"  in  the  R&D  food  chain. 

The  other  thing  that  must  happen  is  that  the 
researcher  must  be  able  to  now  let  go  of  his 
original  vision,  the  one  that  led  him  to 
produce  the  CID  technology  that  is  being 
demonstrated.  He  must  allow  himself  to  be 
influenced  by  the  engineer-now-becoming- 
the-customer.  He  must  adopt  a  new  and 
better  vision. 

Vision  building  must  be  a  two  way  street.  It 
is  a  coming  together,  in  the  middle,  of  what 
were  originally  different  visions  at  opposite 
ends  of  the  food  chain.  It  is  not  for  one  end 
of  the  food  chain  to  convince  the  other  end 
that  its  vision  is  best.  It  demands  two-way 
communication.  It  is  intense.  It  requires 


2-10 


face-to-face  interactions  over  a  period  of 
time.  It  demands  that  a  new  paradigm  of 
communication  be  built  into  the  research 
engine! 

This  is  the  type  of  vision  building  that 
generates  the  high  levels  of  motivation  and 
feelings  of  personal  worth  that  must  be 
present  in  a  good  research  engine.  It  results 
in  engineers  and  process  owners  anxiously 
awaiting  the  results  of  your  research,  calling 
you  to  find  out  how  things  are  going, 
offering  to  help  you,  and  telling  your  money 
giver  to  give  you  more  money.  It  creates 
passion.  It  also  causes  researchers  to  drive 
themselves  from  within  to  work  16  hours  a 
day,  7  days  a  week.  In  that  kind  of 
environment,  it  is  a  lot  of  fun  to  be  a 
researcher. 

Can  we  really  be  bold  enough  to  think  in 
terms  of  government  or  academic 
researchers  really  interacting  with  industry 
in  those  ways?  Well,  this  past  summer,  I 
and  the  Director  of  ICASE  (Institute  for 
Computer  Applications  in  Science  and 
Engineering),  Dr.  M.  Yousuff  Hussaini, 
conducted  an  experiment  in  communication. 
He  sent  one  of  his  research  staff.  Dr. 

Michael  Lewis,  to  Boeing  for  seven  weeks. 
One  of  those  weeks  was  spent  being  tutored 
in  the  teachings  of  competitiveness  and 
strategic  direction.  The  other  six  were  spent 
in  learning  and  observing  first  hand  what  the 
practice  of  business  acquisition,  engineering, 
design,  manufacturing,  and  customer 
support  was  all  about.  The  thing  that  he  was 
not  allowed  to  do  during  these  seven  weeks 
was  to  engage  in  research. 

At  the  end,  I  interviewed  Michael.  I  found 
that  he  had  learned  enough  to  be  able  to 
"look  beyond  what  industry  says  it  needs 
and  to  gain  an  understanding  of  what  he 
could  contribute  in  terms  of  research  that 
could  really  help  industry  but  that  we  were 
probably  unaware  of."  That  is  what  we  must 
strive  to  achieve  in  the  minds  of  all  research 
leaders  who  profess  to  be  working  at  the 
head  of  the  research  food  chain  in  areas  that 
are  related  to  aeronautics. 

I  cannot  envision  the  entire  population  of 
university  faculty  and  government 


laboratories  descending  upon  industiy  sites 
for  seven  weeks  each.  But  I  can  envision  a 
selected  subset  of  strategically  placed 
research  leaders  perhaps  doing  it.  And  if  we 
experiment  with  different  formats  and 
exposure  times,  we  can  probably  reduce  the 
exposure  time  significantly.  We  simply  must 
develop  a  new  paradigm  for  communication! 
Another  interesting  experiment  would  be  to 
provide  that  type  of  exposure  to  the  money 
givers  who  inhabit  the  research 
infrastructure. 

I  don't  yet  know  what  research  Michael 
Lewis  will  choose  to  work  on.  That  will  be 
his  decision.  In  any  event,  I  am  now 
contemplating  a  second  experiment  of 
inviting  him  back  for  a  try  at  vision  building 
whenever  his  research  has  progressed  to  the 
proper  state.  It  will  provide  him  with  the 
opportunity  to  expose  Boeing  people  to 
"touch  and  feel."  I  will  attempt  to  measure 
his  impact  on  the  change  in  vision  that  he  is 
able  to  create  within  Boeing  people,  and  I 
will  attempt  to  ascertain  how  his  own  vision 
has  been  caused  to  change.  I  will  look  for 
an  intersection  of  those  two  visions  as  a 
measure  of  the  effectiveness  of  his  research. 

In  my  view,  the  two  purposes  of 
communication  that  I  am  testing  with  the 
Michael  Lewis  experiments  are  the  key 
communications  that  we  must  build  into  the 
research  engine  of  tomorrow.  One  is  to 
communicate  strategic  alignment  and  a 
broad  understanding  of  the  customer  and 
his  environment.  The  other  is  to  provide  a 
means  for  vision  building,  the  process  of 
achieving  an  intersection  of  the  vision  from 
the  head  of  the  research  food  chain  with 
the  vision  from  the  engineering  trenches. 
That  is  the  process  that  converts  push  to 
pull  and  opens  the  door  to  industrial 
exploitation  of  research. 

The  other  component  of  the  research  engine 
that  will  be  particularly  influential  in  leading 
change  is  the  value  system.  We  simply  must 
find  a  way  to  tear  down  the  walls  that  are 
imprisoning  the  minds  of  many  of  our  most 
brilliant  people. 

Value  systems  cannot  be  created  or  even 
modified  very  much  by  proclamation.  It 


doesn't  work  to  simply  proclaim  that  we  will 
now  adhere  to  a  new  and  different  set  of 
values.  In  the  long  run,  it  is  the  money 
givers  within  the  research  engine  who  have 
the  only  real  power  over  the  value  systems. 
Value  is  ultimately  associated  with  those 
endeavors  that  bring  in  money.  That  is  tme 
in  industry,  in  academia,  and  in  government. 
It  will  be  up  to  the  money  givers  to  do  the 
right  things. 
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Abstract 

The  paper  presents  an  overview  of  parallel  com¬ 
puting  in  computational  fluid  dynamics.  A  tax¬ 
onomy  of  parallel  computing  architectures  and 
programming  paradigms  is  described.  Issues  in 
parallel  computing  are  discussed  including  do¬ 
main  decomposition  and  load  balancing,  perfor¬ 
mance,  scalability,  benchmarks  and  portability. 
Examples  of  experience  with  parallel  computing 
in  the  aerospace  industry  is  described. 

1  Overview 

This  paper  is  intended  for  researchers  in  Com¬ 
putational  Fluid  Dynamics  (CFD)  who  do  not 
have  experience  in  parallel  computing.  It  pro¬ 
vides  a  description  of  parallel  computing  heird- 
ware  architectme,  software  paradigms,  the  prin¬ 
cipal  issues  in  utilizing  parallel  computing  for 
CFD,  and  examples  of  use  of  parallel  comput¬ 
ing  in  the  aerospace  industry. 

Parallel  computing,  particularly  in  computa¬ 
tional  fluid  dynamics,  is  a  broad  field  of  research 
and  development.  The  software  and  hardware 
technology  is  developing  at  an  extraordinary 
pace.  The  reader  is  directed  to  the  numer¬ 
ous  journals  on  parallel  computing  {e.g.,  Inter. 
Journal  of  High  Speed  Computing,  The  Journal 
of  Supercomputing,  Inter.  Journal  of  Parallel 
Programming,  Inter.  Journal  of  Supercomputer 
Applications),  as  well  as  recent  conferences  and 
workshops  (e.g..  Parallel  CFD  ’95),  for  further 
information.  Additionally,  extensive  informa¬ 
tion  is  available  on  the  World  Wide  Web,  e.g., 
http:  / /www. cnb.compunet.de/para/para.html, 
http://www.netlib.org/nhse/. 


2  What  is  Parallel  Computing? 

This  section  presents  an  anecdotal  discussion  of 
the  earliest  refernce  to  parallel  computing,  de¬ 
scribes  Flynn’s  and  Bell’s  classifications  of  paral¬ 
lel  computer  architectures,  and  briefly  discusses 
the  message  passing  and  data  parallel  program¬ 
ming  paradigms. 

2.1  Introduction 

Parallel  computing  is  the  simulataneous  opera¬ 
tion  of  multiple  computational  tasks  on  a  com¬ 
puter  system.  Parallel  computing  has  been  an 
integral  part  of  computing  systems  from  their 
beginning.  The  earliest  reference  to  parallel 
computing  appears  to  be  the  description  by 
L.  Menabrea  of  Charles  Babbage’s  computer. 
Among  the  principal  virtues  of  an  earlier  (but 
evidently  not  final)  design,  Menabrea  describes 
the  capability  (and  importance)  of  parallel  com¬ 
puting  [1]: 

“. . .  Secondly,  the  economy  of  time:  to 
convince  ourselves  of  this,  we  need  only 
consider  that  the  multiplication  of  two 
numbers,  consisting  each  of  twenty  fig¬ 
ures,  requires  at  the  very  utmost  three 
minutes.  Likewise,  when  a  long  series 
of  identical  computations  is  to  be  per¬ 
formed,  such  as  those  required  for  the 
formation  of  numerical  tables,  the  ma¬ 
chine  can  be  brought  into  play  so  as  to 
give  several  results  at  the  same  time, 
which  will  greatly  abridge  the  whole 
amount  of  the  processes.” 

Also,  the  first  general  purpose  electonic  digi¬ 
tal  computer  eniac,  built  to  compute  projec¬ 
tile  and  firing  tables  for  the  US  Army  in  World 
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War  II,  was  a  parallel  computer  with  25  inde¬ 
pendent  computing  units  (20  accumulators,  1 
multiplier,  1  divider/square  rooter,  and  3  table 
look-up  units)  performing  different  tasks  for  the 
solution  of  the  specific  problem.  Moreover,  the 
ENIAC  used  decimal  arithmetic  internally  (as  op¬ 
posed  to  the  binary  arithmetic  used  on  modern 
computers)  and  operated  on  aU  ten  decimal  dig¬ 
its  of  a  number  in  parallel.  The  ENIAC  was  pro¬ 
grammed  in  hardware,  i.e.,  using  a  plugboard  to 
wire  coimections  between  the  umts.  However, 
the  parallel  computing  capability  of  the  ENIAC 
was  never  fully  reahzed  in  practice.  After  two 
years  of  operation,  it  was  reconfigured  as  a  serial 
centralized  computer  [2]. 

There  are  four  distinct  levels  of  parallelism 
[1].  The  highest  level  is  job,  where  the  com¬ 
puter  system  operates  simultaneously  on  unre¬ 
lated  tasks  (e.fif.,  a  CFD  simulation  for  an  F- 
18  and  a  CEM  simulation  for  a  B-2).  The  sec¬ 
ond  level  is  program,  where  the  computer  sys¬ 
tem  operates  simulaneously  on  different  parts 
of  the  same  program  {e.g.,  the  paraUehzation 
of  a  DO  loop  across  multiple  processors).  The 
third  level  is  instruction,  where  the  different  in¬ 
structions  are  performed  in  parallel  (i.e.,  fetch¬ 
ing  one  instruction  from  memory  while  perform¬ 
ing  an  arithmetic  operation).  The  fourth  level  is 
arithmetic  and  bit,  where  parallelism  is  achieved 
within  an  individual  arithmetic  or  bit  instruc¬ 
tion.  This  paper  focuses  on  the  second  level 
(program)  of  parallelism  in  computational  fluid 
dynamics.  We  consider  the  issues  of  parallelism 
in  the  context  of  a  sinple  program  (e.g.,  the  sim¬ 
ulation  of  a  combustion  chamber)  operating  on 
a  parallel  computer. 

2.2  Classification  of  Parallel 
Computer  Architectures 


puter.  Modern  single-processor  workstations 
or  micro- computers  are  examples  of  this  cate¬ 
gory.  Single  Instruction  Stream/ Multiple  Data 
Stream  (SIMD)  computers  have  several  compu¬ 
tational  units  which  can  perform  the  same  op¬ 
eration  (e.g.,  adding  two  numbers)  simultane¬ 
ously  on  different  parts  of  the  data  stream.  An 
example  is  the  Cray  C-90.  Multiple  Instruc¬ 
tion  Stream/Single  Data  Stream  (MISD)  implies 
simultaneous  different  operations  by  separate 
computational  units  on  the  same  data  stream. 
Examples  of  this  type  are  rare.  Multiple  Instruc¬ 
tion  Stream/Multiple  Data  Stream  (MIMD)  in¬ 
dicates  multiple  computational  units  operating 
simultaneously  on  multiple  data  streams.  Ex¬ 
amples  are  the  Thinking  Machines  Corporation 
CM-5,  the  Cray  T3D  and,  indeed,  the  ENIAC. 


Table  1:  Flyim’s  Taxonomy 


Acronym 

Definition 

SISD 

Single  Instruction  Stream  - 
Single  Data  Stream 

SIMD 

Single  Instruction  Stream  - 
Multiple  Data  Stream 

MISD 

Multiple  Instruction  Stream  - 
Single  Data  Stream 

MIMD 

Multiple  Instruction  Stream  - 
Multiple  Data  Stream 

Flynn  [3]  originated  a  classification  of  paral¬ 
lel  architectures  which  has  become  widely 
accepted  (Table  1).  Four  distinct  categories 
are  defined  based  on  the  data  stream  which  is 
the  sequence  of  instructions  and/or  data  exe¬ 
cuted  or  operated  on  by  a  processor.  Single 
Instruction  Stream/Single  Data  Stream  (SISD) 
is  the  conventional  serial  architecture  employ¬ 
ing  a  single  stream  of  data  and  a  single  pro¬ 
cessor.  This  is  also  known  as  the  von  Neu¬ 
mann  computer  (or  architecture)  or  a  serial  com¬ 


Figure  1:  Bell’s  taxonomy  of  MIMD  architec¬ 
tures  (with  examples) 

Flyim’s  classification,  although  useful  for 
broadly  categorizing  parallel  computers  and 
widely  cited,  is  nonetheless  incomplete,  and  var¬ 
ious  other  classifications  have  been  introduced. 
Bell  [4]  subdivides  the  MIMD  category  into  two 
subcategories  as  indicated  in  Fig.  1.  Multi¬ 
processors  are  parallel  computers  with  a  single 
address  memory  (shared  memory),  i.e.,  the 
central  memory  (RAM)  is  organized  into  a  sin- 
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a)  Multiprocessor  b)  Multicomputer 

Figure  2:  Multiprocessor  and  Multicomputer 

gle  logical  address  domain  which  is  accessible 
to  all  of  the  processors  (Fig.  2).  Processors 
PI, ,  Pn  can  access  the  same  data  in  mem¬ 
ory  [i.e.,  the  same  address  location),  albeit  not 
simultaneously.  Examples  are  the  Cray  C-90 
and  SGI  Power  Challenge  XL.  Multicomputers 
are  parallel  computers  with  multiple  distributed 
memory  address  spaces  (Fig.  2).  Processor 
PI, ... ,  Pn  have  dedicated,  independent  mem¬ 
ories  Ml, . . .  ,Mn  which  are  not  directly  acces¬ 
sible  by  each  other.  Examples  are  the  Intel 
Paragon,  IBM  SP2  and  networks  of  individual 
workstations.  If  processor  PI  needs  to  access 
data  in  the  memory  assigned  to  processor  Pn, 
it  sends  a  message  to  Pn  requesting  the  data, 
and  Pn  comphes.  The  transfer  of  data  from 
the  memory  of  one  processor  to  the  memory 
of  another  is  denoted  message  passing,  and 
is  a  principal  characteristic  of  multi-computers. 
All  communications  between  processors  occur 
through  a  communications  network  C  in  Fig. 
2.  Many  dilferent  types  of  communications  net¬ 
work  topologies  have  been  developed  (see  Fig.  3 
ofBeU  [4]). 

The  relative  advantages  and  disadvantages  of 
multi-processors  vs.  multi- computers  have  been 
widely  studied,  and  numerous  research  (and  pro¬ 
duction)  machines  of  both  types  have  been  con¬ 
structed  [4].  Although  greatly  oversimphfied, 
the  main  issues  are  as  follows.  For  a  multi¬ 
processor,  the  shmed  memory  eliminates  the 
computational  cost  and  program  complexity  of 
message  passing.  However,  a  multi-processor 
with  a  single  shared  memory  is  not  scalable, 
i.e.,  the  architecture  cannot  simply  be  scaled 
to  an  arbitrary  number  of  processors  and  arbi¬ 
trary  memory  size.  This  mises  from  the  limi¬ 
tation  on  data  transfer  rate  (bandwidth)  be¬ 
tween  memory  and  processors.  This  has  led  to 
a  subdivision  of  multiprocessors  into  two  cate¬ 


gories,  the  central  memory  multi-processors  as 
described  previously,  and  the  distributed  mem¬ 
ory  multi-processors  (Fig.  1)  where  the  indepen¬ 
dence  of  the  distributed  memories  is  hidden  from 
the  user  by  means  of  an  automatic  data  trans¬ 
fer  mechanism  (caching).  For  a  multi- computer, 
the  distributed  memory  eliminates  the  scalabil¬ 
ity  problem  associated  with  a  single  memory 
of  hmited  bandwith.  However,  multi- computers 
incur  the  computational  cost  and  program  com¬ 
plexity  of  message  passing. 

Other  classifications  of  parallel  computers  have 
been  developed,  e.g.,  Shore  [5],  and  Hockney 
and  Jesshope  [1]. 

2.3  Parallel  Programming 

There  are  two  basic  types  of  parallel  program¬ 
ming  paradigms  (or  enviromnents).  As  the 
name  suggests,  message  passing  involves  the 
explicit  use  of  send  and  receive  functions  by  the 
apphcations  programmer.  These  functions  com¬ 
municate  information  between  the  memory  as¬ 
signed  to  individual  processors.  Many  manufac¬ 
turers  of  distributed  memory  parallel  computers 
have  developed  speciahzed  message  passing  li¬ 
braries  {e.g.,  nCUBE,  Intel),  although  standards 
are  emerging  (see  §3.5.2  and  3.5.3).  The  data 
parallel  paradigm  involves  a  single  program 
which  controls  the  distribution  of  data  across 
aU  processors,  and  the  operations  on  the  data. 
Typically,  the  data  parallel  language  supports 
array  operations  and  permits  entire  arrays  to  be 
used  in  expressions.  Manufacturers  of  shared 
memory  parallel  computers  have  developed  spe- 
ciaHzed  compiler  directives  for  data  parallel  pro¬ 
gramming  {e.g.,  Cray  C-90  and  SGI).  An  emerg¬ 
ing  standard  for  a  data  parallel  language  is  High 
Performance  Fortran  (§3.5.1). 

2.4  Examples  of  Parallel  Computers 

Table  2  fists  a  number  of  current  parallel  com¬ 
puters.  It  should  be  emphasized  that  the  infor¬ 
mation  shown  does  not  fully  describe  the  capa¬ 
bilities  (and  limitations)  of  a  parallel  computer. 
Other  relevant  factors  include  memory  band¬ 
width,  cache  memory,  I/O  bandwidth,  compiler 
technology,  debugging  software,  etc.  Further¬ 
more,  the  performance  specifications  change  fre¬ 
quently  due  to  product  upgrades. 
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Table  2:  Examples  of  Current  and  Future  Parallel  Computers 


Name 

Class 

Max  No.  of 
Processors 

MFlops/ 

Processor 

Max  Memory 
(GByte) 

Type  of 
Memory 

Convex  SPP1200/XA 

MIMD 

128 

240 

32 

Shared 

Cray  J-90 

SIMD 

32 

200 

8 

Shared 

Cray  C-90 

SIMD 

16 

1000 

2 

Shared 

Cray  T-90 

SIMD 

32 

2000 

8 

Shared 

Cray  T3D 

MIMD 

2048 

150 

128 

Distributed 

Cray  T3E  (2Q96) 

MIMD 

2048 

600 

1024 

Distributed 

DEC  8400  5/300 

SIMD 

12 

600 

14 

Shared 

Fujitsu  VPP300 

SIMD 

16 

2200 

32 

Distributed 

IBM  SP-2 

MIMD 

128 

266 

256 

Distributed 

Intel  Paragon  XP/S  35 

MIMD 

512 

150 

16 

Distributed 

NCUBE-2 

MIMD 

4000 

4 

250 

Distributed 

NCUBE-3  (Dec  95) 

MIMD 

12000 

100 

3000 

Distributed 

SGI  Power  Challenge  XL 

SIMD 

18 

360 

16 

Shared 

Thinking  Machines  CM- 5 

MIMD 

512 

160 

64 

Distributed 

Thinking  Machines  CM-500  (Fall  95) 

MIMD 

2048 

160 

256 

Distributed 

LEGEND 

GByte  Gigabyte  (10^  byte) 

MFlops  Millions  of  floating  point  operations  per  second  (theoretical  maximum) 

NOTES 

1.  Maximum  Number  of  Processors  m.&y  refer  to  processing  elements  on  some  systems. 

2.  Memory  does  not  include  secondary  memory  storage  {e.g.,  Solid-State  Storage  Device  (SSD) 
on  the  Cray  C-90/T-90). 

3.  Dates  in  parentheses  indicate  manufacturer’s  published  date  for  availability. 
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3  Issues  in  Parallel  Computing 

Effective  utilization  of  parallel  computing  in 
computational  fluid  dynamics  involves  numer¬ 
ous  issues  which  must  be  adequately  addressed. 
In  this  section,  we  focus  on  several  key  questions, 
in  the  context  of  development  of  new  codes  for 
parallel  computing. 

3.1  Domain  Decomposition  and  Load 
Balancing 

The  partitioning  of  data  and  computational 
tasks  among  multiple  processors  is  denoted  do¬ 
main  decomposition.  An  example  is  shown  in 
Fig.  3.  A  two-dimensional  structured  grid  for  a 
jet  engine  nozzle  is  partitioned  into  subdomains, 
and  each  subdomain  assigned  to  an  individual 
processor.  This  approach  typifies  the  domain 
decomposition  for  a  multi- computer.  The  do¬ 
main  decomposition  may  occur  prior  to  or  dur¬ 
ing  the  execution  of  the  flow  code. 


Figure  3:  Multi-block  grid  (from  [46]) 

The  principal  objective  of  domain  decomposi¬ 
tion  is  to  maintain  uniform  computational  activ¬ 
ity  on  all  processors.  This  is  known  as  load  bal¬ 
ancing.  For  a  fixed  numerical  algorithm  (e.g., 
the  Euler  equations)  on  a  fixed  grid,  load  balanc¬ 
ing  is  straightforward,  i.e.,  each  processor  is  as¬ 
signed  approximately  the  same  number  of  cells. 
However,  several  factors  can  complicate  load 
balancing.  First,  the  nature  of  the  governing 
equations  can  change  during  the  computation. 
An  example  is  combustion,  where  the  chemical 
reaction  source  terms  are  computed  oidy  when 
the  local  static  temperature  exceeds  a  preset 
vcdue  [6,  7].  Second,  the  number  of  govern¬ 
ing  equations  in  a  given  subdomain  can  change. 
An  example  is  particle  tracking  where  particles 
can  accumulate  in  a  subregion  {e.g.,  recircula¬ 
tion  zone).  Third,  the  grid  can  change  dtuing 
the  computation  due  to  adaptation.  Thus,  in 


many  cases  it  is  necessary  to  incorporate  dy¬ 
namic  load  balancing,  wherein  the  load  on  each 
processor  is  monitored  and  the  overall  task  load 
redistributed  to  achieve  an  approximate  uniform 
load. 

An  example  of  a  simple  dynamic  load  balancing 
method  is  presented  in  Borrelli  [6]  for  hypersonic 
reacting  flow.  The  chemical  reactions  are  im¬ 
portant  only  when  the  local  static  temperature 
exceeds  2000  deg  K,  and  the  ratio  of  computa¬ 
tional  work  for  reacting  vs.  non-reacting  flow  is 
approximately  ten.  The  dynamic  load  balancing 
algorithm  decomposes  the  domain  by  assigning 
a  weighting  function  of  either  1  or  10  to  each 
cell,  corresponding  to  non-reacting  and  react¬ 
ing,  respectively,  and  subdividing  the  domain  to 
achieve  an  approximate  uniform  average  weight¬ 
ing  function  for  each  subdomain. 

3.2  Performance 

A  key  issue  is  the  performance  of  a  CFD  code 
on  a  parallel  computer.  Many  different  mea¬ 
sures  of  performance  have  been  proposed,  and 
there  is  an  active  debate  regarding  the  most  ap¬ 
propriate.  However,  in  solving  a  given  problem 
(e.g.,  viscous  flow  past  an  F-18),  the  true  mea¬ 
sure  of  performance  is  simply  the  wall  clock 
time  to  completion.  Inotherwords,  given  the 
opportunity  to  choose  among  different  compu¬ 
tational  resources,  the  individual  typically  se¬ 
lects  the  resource  which  yields  the  answer  in  the 
shortest  elapsed  time,  subject  to  existing  con¬ 
straints  (e.g.,  budget,  system  load,  etc). 

Of  course,  it  is  impossible  to  model  this  selec¬ 
tion  process  in  a  universal  manner,  and  thus  the 
development  of  performance  measures  have  fo¬ 
cused  principally  on  more  ideal  cases.  One  per¬ 
formance  measure  is  megaflop  (i.e.,  millions  of 
floating  point  operations  per  second)  vs.  num¬ 
ber  of  processors.  An  example  is  presented  in 
Fig.  4  from  Simon  et  al  [8]  for  two  different 
codes:  a  2-D  rmstructured  Euler  code  [9],  and 
a  3-D  particle  simulation  code  for  rarefied  gas 
flows  [10].  Both  codes  were  executed  on  an  In¬ 
tel  iPSC/860  multicomputer  for  2"  processors 
where  n  =  1, . . . ,  7.  The  unstructured  Eider 
code  achieves  a  substantially  higher  megaflop 
performance  than  the  particle  code. 

Another  performance  measure  is  efficiency. 
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3.3  Scalability 


No.  of  Processors 

Figure  4:  Megaflops  of  Two  Codes  on  the  Intel 
iPSC/860 

i.e.,  the  fraction  of  the  peak  performance  (rela¬ 
tive  to  a  single  processor)  achieved  on  a  machine 
by  a  specific  code.  It  is  defined  as 

CPU  time  for  one  processor 
^  ~  n  X  CPU  time  for  n  processors 

Typically,  the  efficiency  rj  is  plotted  against  the 
number  of  processors  n.  A  related  quantity  is 
the  speedup  S  defined  as 

S  =  riT]  (2) 

Efficiency  can  depend  strongly  on  the  algorithm. 
An  example  is  presented  in  Fig.  5  from  Simon  et 
al  [8]  for  the  same  codes  as  in  Fig.  4.  Here  the 
trend  is  opposite  to  the  megaflop  performance 
measure,  i.e.,  the  3-D  particle  code  retains  88% 
efficiency  at  n  =  128,  while  the  3-D  unstruc¬ 
tured  code  drops  to  52%  at  n  =  128. 


No.  of  Processors 


Figure  5:  Efficiency  of  Two  Codes  on  the  Intel 
iPSC/860 


The  impact  of  scaling  a  given  parallel  computer 
architecture  to  increasingly  larger  number  of 
processors  is  a  key  concern.  Although  this  prob¬ 
lem  may  be  viewed  from  several  perspectives,  it 
is  instructive  to  examine  it  in  the  following  con¬ 
text.  Consider  the  solution  of  a  given  computa¬ 
tional  fluid  dynamics  problem,  e.g.,  a  Reynolds- 
averaged  Navier- Stokes  simulation  of  an  entire 
aircraft  configuration  using  a  fixed  number  of 
grid  points.  How  does  the  efficiency  of  the  com¬ 
putation  depend  on  the  number  of  processors  ? 
This  question  may  be  treated  (albeit  simphs- 
tically)  by  a  straightforward  analysis  proposed 
by  Amdahl  [11].  Denote  the  execution  time  of 
the  program  on  a  single  processor  by  ti.  As¬ 
sume  that  an  analysis  of  the  program  and  algo¬ 
rithm  structure  indicated  that  a  portion  of  the 
code  could  be  reprogrammed  for  parallel  execu¬ 
tion  {e.g.,  the  product  of  a  matrix  and  a  vector, 
which  is  a  common  operation  in  iterative  meth¬ 
ods  for  solution  of  hnear  systems).  Let  tp  denote 
the  cpu  time  on  the  single  processor  for  this  po¬ 
tentially  parallefizable  section.  Let  tg  denote  the 
cpu  time  for  the  remaining  unparallelizable  {i.e., 
scalar)  code.  Neglecting  the  cost  of  scheduling 
processors,  communications  between  processors 
(if  any)  and  synchronization  time  {i.e.,  the  time 
required  to  allow  aU  processors  to  reach  a  com¬ 
mon  point  following  execution  of  the  parallel  sec¬ 
tion  of  code),  the  efficiency  of  a  parallel  compu¬ 
tation  with  n  processors  is 

ii  is  U  ip  fn\ 

v  -  ~r  -  “71 — rrTT 

TVtfi  H”  ^p/ 

and  defining  the  parallelizable  fraction  /  = 
ip! (i^  ip)’ 

This  is  known  as  Amdahl’s  Law  and  is  dis¬ 
played  in  Fig.  6.  The  precipitous  drop  in  ef¬ 
ficiency  for  all  but  the  highest  possible  paral¬ 
lelizable  fractions  is  strikingly  clear.  Even  for 
/  =  0.99,  the  efficiency  rj  is  0.5  at  n  =  101. 

In  some  cases,  the  communications  cost  may 
yield  even  lower  efficiencies  than  predicted  by 
Amdahl’s  Law.  Consider  a  fixed  domain  V  of 
cells  on  a  multi- computer  with  n  processors. 
Assume  a  equi-distribution  T>k,  k  =  I,..., n  of 
N^/n  cells  to  each  processor.  Typically,  a  halo 
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Figure  6;  Amdahl’s  Law 


of  fictitious  cells  are  added  to  each  processor 
which  represent  the  additional  information  nec¬ 
essary  to  integrate  the  flow  variables  within  cells 
assigned  to  the  processor  by  a  single  time  step. 
The  number  of  halo  cells  is  proportional  to  the 
number  of  cells  in  T>k  which  share  one  or  more 
faces  with  other  subdomains  Vi,  I  ^  k,  and  is 
therefore  0{N^/nY^^.  The  ratio  of  communica¬ 
tions  time  to  flowfield  integration  time,  denoted 
by  C)  is  therefore 


0(AfVn)V^  /  1 

Nyn  \Nyn) 


(5) 


Thus,  for  a  fixed  number  of  cells,  the  relative 
cost  of  communications  can  increase  as  the  nmn- 
ber  of  processors  increases^ . 


Nevertheless,  benchmarks  provide  insight  into 
the  relative  performance  of  different  parallel 
computers.  One  of  the  most  widely  cited  is 
the  NAS  Parallel  Benchmarks  [13, 14]  which 
includes  five  kernels  (two  dimensional  statistics 
from  a  Gaussian  pseudo-random  number  gener¬ 
ator,  multigrid  3-D  Poisson  equation,  conjugate 
gradient  methods  computation  of  the  smallest 
eigenvalue  of  a  large  sparse  symmetric  positive 
definite  matrix,  3-D  Fast  Fourier  Transform,  and 
integer  sort)  and  three  simulated  CFD  applica¬ 
tions  (SSOR  algorithm  for  block  5x5  system, 
scalar  pentadiagonal  system,  and  block  tridiago¬ 
nal  system).  The  NAS  Parallel  Benchmarks  are 
described  algorithmically,  rather  than  in  a  spe¬ 
cific  programming  language^. They  have  been  ex¬ 
ecuted  on  niunerous  machines  including  Convex 
Exemplar  SPPIOOO,  Cray  C90/T90/J90/T3D, 
DEC  Alpha  Server  8400,  Fujitsu  VPP500,  IBM 
SP2  (Thin  and  Wide  Node)  and  SGI  Power 
Challenge  XL.  Saini  and  Bailey  [16]  make  sev¬ 
eral  observations.  These  include  1)  the  perfor¬ 
mance  per  unit  cost  (e.g.,  MFlops  per  dollar) 
of  the  Cray  C-90  was  the  lowest  of  all  systems 
tested'^,  and  2)  all  vendors  employed  their  own 
specialized  parallelization  directives  to  achieve 
maximum  performance.  Future  enhancements 
to  the  NAS  Parallel  Benchmarks  include  the  de¬ 
velopment  of  a  version  using  High  Performance 
Fortran  and  Message  Passing  Interface  (see  be¬ 
low). 

3.5  Portability 


3.4  Benchmarks 

Numerous  benchmarks  have  been  developed  for 
parallel  computers^.  All  benchmarks  have  hmi- 
tations,  of  course,  and  the  overemphasis  on  (and 
misuse  of)  benchmarks  has  naturally  led  to  a 
somewhat  skeptical  attitude  towards  them.  This 
is  perhaps  best  epitomized  by  Bailey’s  “Twelve 
Ways  to  Obfuscate  the  Performance  of  a  Parallel 
Machine”  [12]. 

‘An  alternate  definition  of  efficiency  (denoted  as 
scaled  effiency,  and  it  counterpart,  scaled  speedup) 
has  been  proposed  whereby  the  ratio  of  communications 
cost  to  computational  cost  remains  fixed  as  n  is  increased. 
This  is  achieved  by  increasing  the  problem  size  {i.e.,  N) 
with  the  number  of  processors.  From  the  above  analysis, 
this  implies  that  ~  n. 

compendium  of  benchmark  reports  is  a- 
vailable  at  http://performance.netlib.org/performance/- 
html/PDSreports.html. 


In  recent  years,  significant  effort  has  been  de¬ 
voted  to  the  development  of  standardized  en¬ 
vironments  for  development  of  parallel  codes. 
Three  specific  areas  of  activity  are  discussed 
here,  namely,  development  of  a  standard  For¬ 
tran  for  parallel  computing  (HPF),  a  standard 
for  heterogeneous,  network-based  parallel  com¬ 
puting  environments  (PVM),  and  the  more  re¬ 
cently  developed  standard  message-passing  in¬ 
terface  MPI.  There  are  many  other  similar  re¬ 
search  efforts  in  progress;  however,  space  does 
not  permit  their  discussion  here. 

®In  contrast,  for  example,  to  the  LINPACK  benchmark 
[15]  for  the  matrix  of  order  100  which  is  written  in  FOR¬ 
TRAN  and  which  may  not  be  modified,  including  the  com¬ 
ment  statements. 

‘The  system  cost  is  assumed  to  be  the  list  price. 
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3.5.1  High  Performance  Fortran 

High  Performance  Fortran  [17,  18,  19]  is 
a  data  parallel  language  which  extends  For¬ 
tran  90  to  provide  additional  support  for  the 
data  parallel  programming  style  while  main¬ 
taining  compatibihty®  with  Fortran  90.  Devel¬ 
opment  of  HPF  was  initiated  in  1991  through 
the  estabhshment  of  the  High  Performance 
Fortran  Forirni,  and  the  language  specifica¬ 
tion  was  published  in  May  1993.  At  present, 
twelve  vendors  have  announced  support  of  HPF. 
Additional  information  may  be  obtained  at 
http:  / /www. erc.msstate.edu/hpff/home.html. 

A  good  introduction  to  HPF  is  provided  by  Fos¬ 
ter  [21]. 

HPF  extends  Fortran  90  to  include  specific  com¬ 
piler  directives  to  control  the  ahgnment  and  dis¬ 
tribution  of  data  on  parallel  machines,  and  in¬ 
troduces  new  parallel  features  and  additional 
intrinsic  library  functions.  For  example,  the 
PROCESSORS  directive  specifies  the  shape  and 
size  of  an  array  of  (abstract)  processors,  and 
the  ALIGN  directive  aligns  elements  of  different 
arrays  with  each  other,  thereby  indicating  that 
they  should  be  distributed  across  processors  in 
the  same  manner.  New  intrinsic  functions  intro¬ 
duced  by  HPF  include  NUHBER_0F_PR0CESS0RS 
and  PROCESSORS-SHAPE  which  allow  a  program 
to  obtain  information  on  the  number  of  proces¬ 
sors  on  which  it  executes  and  the  connection 
topology. 

Examples  of  applications  written  in  HPF  are 
presented  in  Hawick  and  Fox  [22]  and  Mueller 
and  Ruehl  [23].  A  more  extensive  list  is  available 
on  http:/ /www.npac.syr.edu/hpfa/bibl.html. 

3.5.2  Parallel  Virtual  Machine  (PVM) 

A  recent  major  advancement  is  the  develop¬ 
ment  of  heterogeneous,  network-based  parallel 
computing  environments.  Unlike  fix;ed  paral¬ 
lel  computer  architectures  {e.g.,  Cray  C-90,  In¬ 
tel  Paragon,  etc.),  these  network-based  paral¬ 
lel  computers  are  created  as  a  virtual  machine 
using  software  tools  such  as  PVM,  Linda,  P4 
or  Express.  Typically,  any  number  of  different 
networked  computers  may  be  connected  to  form 
a  parallel  machine,  although  usually  the  com- 

^For  a  description  of  Fortran  90,  see  Metcedf  and  Reid 

[20]. 


puters  are  fairly  simiUar.  Below,  we  provide  a 
brief  description  of  PVM.  Description  of  other 
systems  are  available  {e.g.,  [24,  25,  26]),  and  a 
reasonably  comprehensive  hsting  has  been  com- 
phed  by  Turcotte  [27].  Comparisons  of  the  rel¬ 
ative  merits  of  different  systems  have  also  been 
pubHshed  {e.g.,  [28]). 

PVM  (Parallel  Virtual  Machine),  created 
by  the  Heterogeneous  Network  Project  (Oak 
Ridge  National  Laboratory,  the  University  of 
Tennessee  and  Emory  University)  initiated  in 
1989,  consists  of  two  software  packages  [29,  30, 
31,  32,  33].  The  first  is  a  daemon  pvmdS  which 
executes  on  all  of  the  computers  which  comprise 
the  virtual  parallel  machine.  PVM  is  designed 
to  enable  any  user  with  a  vahd  login  to  install 
and  initiate  pvmdS.  The  user  specifies  a  fist  of 
computers  which  comprise  the  virtual  parallel 
machine,  and  starts  pvmdS  on  each  one.  The 
PVM  application  can  then  be  initiated  from  any 
of  the  computers.  The  second  is  a  hbrary  of 
PVM  routines  libpvmS.  a  which,  contains  the  user 
callable  routines  for  message  pasing,  spawning 
processes,  coordinating  tasks  and  modifying  the 
virtual  machine. 

PVM  has  been  successfully  implemented  on  nu¬ 
merous  computer  architectures  [33].  These  in¬ 
clude  heterogeneous  and  homogeneous  networks 
of  computers,  and  also  “individual”  massively 
parallel  computers  {e.g.,  Intel  Paragon  and  Cray 
T3D).  PVM  is  widely  utilized  in  academia,  in¬ 
dustry  and  government  laboratories.  It  is  es¬ 
timated  that  more  than  10,000  individuals  or 
installations  have  obtained  the  PVM  software 
and  approximately  20%  to  25%  are  actively  us¬ 
ing  it  [34].  An  index  of  PVM  software  may  be 
obtained  by  sending  the  message  send  index 
from  pvm3  to  netlib@ornl.gov. 

An  example  of  a  PVM  appHcation  is  the  Kor- 
ringa,  Kohn  and  Rostoker  coherent  potential  ap¬ 
proximation  (KKR-CPA)  method  for  computing 
the  electronic  properties,  energetics  and  other 
ground  state  properties  of  substitutionally  disor¬ 
dered  alloys  [33].  An  approximate  three  month 
effort  converted  the  20K  fine  KKR-CPA  code 
for  PVM.  The  code  achieved  approximately  200 
MFlops  using  a  network  of  ten  IBM  RS/6000 
(6  model  530’s  and  4  model  320’s)  worksta¬ 
tions,  which  is  estimated  to  be  approximately 
82%  of  the  maximum  floating  point  capability 
of  this  virtual  system.  Also,  the  PVM  KKR- 
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CPA  code  achieved  over  9  GFlops  performance 
using  a  network  of  twenty  seven  Cray  C-90  and 
Cray  Y-MP  processors  scattered  across  several 
sites.  Furthermore,  the  PVM  KKR-CPA  code 
was  successfully  demonstrated  for  a  virtual  ma¬ 
chine  consisting  of  two  Intel  Paragons,  a  CM-5, 
an  Intel  i860  and  IBM  workstations,  which  were 
geographically  distributed  at  several  sites. 

Load  balancing,  latency  and  bandwidth  are 
clearly  important  issues  for  implementation  of 
a  virtual  machine  with  PVM  or  other  similar 
tools.  In  a  heterogeneous  environment,  due  con¬ 
sideration  of  the  relative  performance  of  indi¬ 
vidual  hosts  is  obviously  needed  in  domain  de¬ 
composition.  Latency  [i.e.,  the  time  required  to 
initiate  a  message)  can  be  a  critical  issue,  de¬ 
pending  on  the  ratio  of  communications  to  com¬ 
putation.  Network  bandwith  may  be  restricted 
due  to  existing  traffic.  Recent  enhancements 
to  PVM  [34]  provide  for  improved  performance. 
For  example,  the  message  passing  performance 
of  PVM  on  the  Intel  Paragon®  is  only  5%  to  8% 
slower  than  the  native  functions  [34] . 

3.5.3  Message  Passing  Interface  (MPI) 

MPI  (Message  Passing  Interface)  is  a  mes¬ 
sage  passing  standard  for  homogeneous  and  het¬ 
erogeneous  parallel  and  distributed  computing 
systems.  The  development  of  the  MPI  standard 
is  a  multinational  effort  which  was  initiated  in 
1992  and  is  supported  by  ARPA,  NSF  and  the 
Commission  of  the  European  Community.  The 
MPI  standard  was  published  in  1994  and  is  de¬ 
scribed  in  [35,  36,  37].  A  good  introduction  to 
MPI  is  provided  by  Foster  [21],  and  a  brief  de¬ 
scription  is  presented  in  [38]. 

An  MPI  program  includes  one  or  more  processes 
which  communicate  with  each  other  through 
calls  to  MPI  library  routines.  There  are  two 
types  of  communications,  namely,  point-to- 
point  communication  between  pairs  of  processes, 
and  collective  communication  between  groups  of 
processes.  Several  variants  of  “send”  functions 
are  provided  to  enable  users  to  achieve  peak  per¬ 
formance.  Two  basic  types  of  communications 
topologies  are  provided:  a  cartesian  grid  and  an 
arbitrary  process  graph  [38]. 

®Using  the  functions  pvm.psendO  and  pvm.precvO 
introduced  in  PVM  Version  3.3. 


Due  to  its  recent  introduction,  there  are  a  rela¬ 
tively  small  number  of  applications  to  date  us¬ 
ing  MPI.  A  recent  review  by  SkjeUmn,  Lusk 
and  Gropp  [39]  describes  recent  applications  in¬ 
cluding  rmsteady  incompressible  viscous  flows, 
groundwater  modeling,  volume  visualization 
and  traffic  simtolation.  Native  MPI  implementa¬ 
tions  are  currently  imder  development  by  several 
parallel  computer  vendors  [40]. 

4  Parallel  Computing  in 
Aerospace  Research 

Despite  the  extensive  research  on  parallel  com¬ 
puting,  only  a  small  fraction  of  numerical  sim¬ 
ulations  of  aerospace  research  problems  employ 
parallel  computing.  A  survey  of  the  citations 
for  parallel  and  other  computers  for  three  jour¬ 
nals  is  presented  in  Table  3.  The  period  July 
1993  through  July  1995  was  surveyed  for  aU  ar¬ 
ticles  presenting  research  involving  significant 
numerical  simulation.  Approximately  44%  of 
these  articles  indicated  that  a  serial  or  vector 
machine  (single  processor)  was  employed,  while 
only  3.4%  specifically  noted  that  a  parallel  com¬ 
puter  was  used.  Approximately  52%  did  not  in¬ 
dicate  that  machine  used.  If  the  statistics  for 
the  first  two  categories  are  assumed  statistically 
representative  of  the  last  group,  than  an  overall 
estimate  (upper  bound)  for  the  parallel  applica¬ 
tions  is  7%. 

Why  are  so  few  research  simulations  performed 
on  parallel  computers  ?  Certainly,  research  on 
parallel  computing  has  shown  the  capability  for 
solving  a  wide  range  of  fluid  dynamics  problems. 
At  the  Parallel  CFD  ’95  Conference,  applica¬ 
tions  of  parallel  computing  were  presented  for 
reacting  flows,  Euler  and  Navier- Stokes  solvers, 
spectral  methods,  multigrid  methods,  and  adap¬ 
tive  schemes.  Numerous  other  applications 
have  been  developed  {e.g.,  see,  for  example, 
[41,  42,  43,  44]). 

I  posed  this  question  to  a  number  of  experts  in 
parallel  CFD.  The  answers  tended  to  be  fairly 
similar,  and  not  at  all  surprising.  AU  focused 
on  the  issue  of  calendar  time  required  to  solve  a 
particular  problem.  As  one  person  stated,  “The 
machine  which  you  use  to  solve  a  problem  is  ir¬ 
relevant.  The  only  thing  that  matters  is  how 
quickly  you  can  get  the  problem  done.”  At  the 
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Table  3:  Citations  of  Parallel  and  Other  Computers  (July  93  -  July  95) 


Journal 

Parallel 

Serial/Vector 

Not  Stated 

Total 

AIAA  Journal 

7 

104 

no 

221 

Journal  of  Aircraft 

0 

48 

47 

95 

Journal  of  Fluid  Mechanics 

8 

44 

76 

128 

Total 

15 

196 

233 

444 

Percent 

3.4% 

44.1  % 

52.5% 

100.0  % 

present  time,  many  CFD  researchers  who  are 
not  using  parallel  computing  view  parallel  CFD 
as  1)  lacking  a  decisive  advantage  performance 
advantage  [e.g.,  MFlops)  over  conventional  se¬ 
rial  (and  vector)  computers  in  many  instances, 
2)  difficult  to  program  efficiently,  and  3)  lacking 
in  portability. 

AU  of  these  factors  are  likely  to  diminish  in  the 
near  future,  and  thus  the  use  of  parallel  comput¬ 
ing  in  aerospace  research  should  increase.  Mi¬ 
croprocessor  CPU  performance  continues  to  im¬ 
prove  by  a  factor  of  1.5  to  2.0  per  year'’’  [45],  and 
consequently  parallel  machines  are  now  compa¬ 
rable  or  faster  than  traditional  vector  super¬ 
computers.  For  example,  the  Cray  T3D  (512 
processors)  is  on  average  41%  faster®  than  the 
Cray  C-90  (16  processors)  for  the  three  sim¬ 
ulated  CFD  applications  in  the  NAS  Parallel 
Benchmarks.  The  Cray  T-3D  (1024  processor) 
is  128%  faster.  The  IBM  SP2-WN  (160  proces¬ 
sors)  was  also  significantly  faster  than  the  Cray 
C-90  (16  processors)  [16].  Also,  the  emergence 
of  standards  in  parallel  programming  languages 
{e.g.,  HPF)  and  message  passing  functions  (e.p., 
PVM,  MPI)  simplify  the  development  of  parallel 
code  and  significantly  enhance  its  portability. 

5  Parallel  Computing  in 
Aerospace  Industry 

Parallel  computing  has  a  major  presence  in  the 
aerospace  industry.  Within  the  past  few  years, 
several  major  aerospace  corporations  have  de¬ 
veloped  extensive  Networks  of  Workstations 

^The  rate  of  improvement  of  microprocessor  perfor¬ 
mance  is  much  faster  than  for  the  specialized  processors 
developed  for  traditional  vector  machines  (e.p.,  Cray  C- 
90) 

®/.e.,  the  ratio  of  the  execution  time  on  the  Cray  C-90 
to  the  Cray  T3D  was  1.41. 


(NOWs)  for  production  analysis  and  design. 
Two  examples  are  Pratt  &  Whitney  (East  Hart¬ 
ford,  CT,  and  Palm  Beach,  FL)  and  McDonnell 
Douglas  Aerospace  (St.  Louis,  MO). 

Pratt  &  Whitney  (P&W)  initiated  their  Net¬ 
work  of  Workstations  concept  [46]  in  mid-1989. 
The  decision  was  motivated  by  two  factors. 
First,  P&W  had  an  installed  base  of  worksta¬ 
tions  which  had  been  acquired  principally  for 
design/ drafting  work,  but  which  were  effectively 
unused  in  the  evenings  and  on  weekends.  Thus, 
there  was  a  surplus  of  compute  cycles  which 
could  be  employed  for  analysis  and  design,  pro¬ 
vided  that  the  computational  tasks  could  be  de¬ 
composed  and  parallehzed.  Second,  their  exist¬ 
ing  Cray  X-MP,  purchased  in  1986,  was  both 
severely  overloaded  and  limited  in  capability 
{e.g.^  memory).  Hence,  there  was  a  significant 
incentive  to  invest  resources  in  development  of 
a  new  paradigm  for  CFD  analysis  and  design. 

The  P&W  approach  consists  of  several  parts. 
The  flow  solver  is  NASTAR,  a  3-D  struc¬ 
tured  grid  multi-block  Navier-Stokes  code.  Do¬ 
main  decomposition  is  straightforward,  i.e., 
each  block  is  assigned  to  an  individual  processor 
(workstation).  An  example  is  shown  in  Fig.  3. 
The  momentum,  energy  and  turbulence  scalar 
equations  are  solved  using  Successive  Line  Un¬ 
der  Relaxation  (SLUR).  The  SLUR  iterations 
are  performed  independently  within  each  block, 
with  periodic  updating  of  the  boundary  condi¬ 
tions  to  transmit  information  between  blocks. 
The  optimal  updating  strategy  is  found  by  nu¬ 
merical  experiments.  The  pressure  correction 
equation  is  solved  to  satisfy  the  continuity  equa¬ 
tion,  and  employs  a  parallelized  Preconditioned 
Conjugate  Residual  (PCR)  algorithm.  The  ma¬ 
jority  of  the  computational  effort  is  expended  in 
the  pressure  correction  equation,  and  thus  con¬ 
siderable  effort  was  focused  on  efficient  paral- 
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lelization  of  the  PCR  algorithm.  Management  of 
the  individual  block  computations  is  performed 
by  Prowess  (Parallel  Running  of  Workstations 
Employing  SocketS),  developed  by  P&W,  which 
provides  communications,  parallel  job  process 
control,  accounting,  rehahility  and  workstation 
user  protection.  Communications  between  in¬ 
dividual  workstations  is  performed  directly  us¬ 
ing  sockets  which  emulate  a  file  1/ 0  paradigm. 
Checkpointing  is  employed  to  achieve  high  reh- 
abihty.  Workstation  user  protection  is  the  im¬ 
plementation  of  the  P&W  pohcy  that  the  in¬ 
teractive  user  has  the  first  priority  on  a  work¬ 
station.  Thus,  for  example.  Prowess  suspends 
(or  terminates)  any  remote  process  executing 
on  a  workstation  as  soon  as  any  activity  is  de¬ 
tected  on  the  workstation’s  keyboard  or  mouse. 
Idle  worksations  capable  of  executing  NASTAR 
are  identified  using  the  Load  Sharing  Facil¬ 
ity  (LSF)  software  from  Platform  Computing 
Corporation. 

The  P&W  workstation  network  employed  for 
paraRel  computing  is  substantial.  Approxi¬ 
mately  400  to  600  workstations  are  employed 
daily  for  parallel  CFD  jobs  at  P&W’s  East  Hart¬ 
ford,  CT  facihty,  and  another  300  to  400  work¬ 
stations  at  Palm  Beach,  EL.  The  growth  in  us¬ 
age  of  the  workstation  network  for  parallel  CFD 
application  is  displayed  in  Fig.  7. 
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Figure  7:  Daily  parallel  CFD  throughput  on 
Pratt  &  Whitney’s  East  Hartford,  CT  worksta¬ 
tion  network  (from  [46]) 

A  critical  element  in  the  Network  of  Work¬ 
stations  approach  is  the  network  configuration. 
Adequate  communications  bandwith  is  essential 
for  effective  distributed  parallel  computing.  The 
P&W  East  Hartford,  CT  network  architecture  is 
shown  in  Fig.  8.  It  includes  mrdtiple  Fiber  Dis¬ 
tributed  Data  Interface  (FDDI)  100  Mbps  back¬ 
bone  networks  coimected  by  Digitial  Equipment 


Figure  8:  Pratt  &  Whitney  network  backbone 
in  East  Hartford,  CT  (from  [46]) 

FDDI  Gigaswitches.  There  are  approximately 
200  ethernet  segments. 

Pratt  &  Whitney  has  concluded  that  their  Net¬ 
work  of  Workstations  paradigm  has  been  suc¬ 
cessful.  Fischberg  et  al  [46]  cite  a  reduction  in 
design  time  of  50%  to  67%  for  a  high  pressure 
compressor  and  fan  design,  respectively. 

McDonnell  Douglas  Aerospace  initiated  their 
Network  of  Workstations  concept  [47]  in  late 
1992.  The  decision  was  motivated  by  factors 
similar  to  Pratt  &  Whitney.  First,  McDonnell 
Douglas  had  a  substantial  number  of  worksta¬ 
tions  (mostly  Hewlett-Packard  7xx,  plus  a  small 
number  of  IBM  RS/6000  and  Sihcon  Graphics) 
which  had  been  acquired  principally  for  CAD. 
These  workstations  were  typically  utilized  din¬ 
ing  the  daytime,  and  largely  unused  in  evenings 
and  on  weekends.  Second,  their  existing  Cray 
X-MP/18  was  both  heavily  loaded  and  limited 
in  capability  (e.g.,  memory),  and  the  corporate 
financial  position  precluded  a  multi-million  dol¬ 
lar  new  supercomputer  acquisition.  Third,  Mc- 
DonneU  Douglas  wanted  to  gain  experience  with 
parallel  computing  technology. 

The  McDonnell  Douglas  Aerospace  approach 
consists  of  several  parts.  The  flow  solver  is 
NASTD,  a  proprietary  3-D  structured  grid 
multi-block  compressible  Euler/Navier-Stokes 
code.  The  code  is  heavily  utilized  at  McDon¬ 
nell  Douglas,  with  typically  fifty  active  users. 
A  straightforward  domain  decomposition  is  em¬ 
ployed,  whereby  a  grid  block  (subdomain)  is  as¬ 
signed  to  an  individual  processor  (workstation). 
The  code  is  operated  in  a  master/slave  relation¬ 
ship  using  PVM  [32]  for  process  control  and  ex¬ 
plicit  message  passing  between  processors. 

Parallel  computations  using  NASTD  are  per¬ 
formed  in  the  evenings  and  on  weekends  us- 
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ing  up  to  400  workstations  in  clusters  of  15  to 
20  workstations  per  job.  The  reliability  (i.e., 
the  percentage  of  submitted  jobs  which  com¬ 
plete  successfully)  exceeds  95%.  Numerous  dif¬ 
ficulties  were  resolved  in  achieving  this  perfor¬ 
mance,  many  of  which  were  management  issues, 
e.g.,  negotiating  scheduled  hardware,  software 
and  network  maintenance  (which  had  oftentimes 
occurred  at  random  intervals  at  nights  and  on 
weekends),  and  changing  the  perception  that 
the  individual  user  “owned”  the  workstation  and 
could  therefore  reboot  it  whenever  desired  (thus 
terminating  any  slave  process  in  operation  and 
crashing  the  entire  parallel  computation). 

6  Conclusions 

Several  main  conclusions  can  be  drawn  regard¬ 
ing  parallel  computing  in  CFD: 

•  There  is  a  large  number  of  vendors  of  par¬ 
allel  computers  whose  systems  offer  a  wide 
range  of  performance. 

•  Modern  parallel  computers  can  equal  or  ex¬ 
ceed  the  performance  of  the  largest  multi¬ 
processor  Cray  supercomputers. 

•  The  aerospace  industry  has  taken  a  leading 
role  in  the  application  of  parallel  computing 
to  practical  analysis  and  design. 

•  The  aerospace  research  community  {e.g., 
academia  and  research  laboratories)  has 
taken  a  leading  role  in  research  on  par¬ 
allel  computing,  but  has  not  significantly 
employed  parallel  computing  in  solving 
aerospace  research  problems. 

•  The  development  of  message  passing  stan¬ 
dards  {e.g.,  PVM  and  MPI)  and  data  par¬ 
allel  programming  language  standards  {e.g., 
HPP)  will  expand  use  of  parallel  computing 
in  CFD. 
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SUMMARY 

This  paper  describes  the  portable  parallelization  of  the 
FLOWer  code,  a  large,  block  structured  CFD  solver  for 
industrial  use.  Basic  requirements  for  the  parallelization 
are  identified,  and  the  strategies  applied  for  its  parallel¬ 
ization  are  explained.  Special  emphasis  is  put  on  the 
parallel  heart  of  the  program,  the  communications  li¬ 
brary  CLIC-3D.  Results  obtained  on  several  platforms 
demonstrate  the  success  of  the  method  chosen  and  allow 
an  assessment  of  today's  capabilities  of  parallel  comput¬ 
ers  in  CFD  applications.  Parallel  computations  of  air¬ 
craft  configurations  of  varying  complexity  prove  that 
parallel  computers  have  become  operational  in  aircraft 
development. 

LIST  OF  SYMBOLS 

Cp  specific  heat  at  constant  pressure 

D  vector  of  artificial  dissipative  fluxes 

E  total  energy 

F  flux  tensor 

H  total  enthalpy 

k  heat  transfer  coefficient 

Ng  number  of  blocks 

n  outward  pointing  unit  normal  vector 

Pr  Prandtl  number 

p  pressure 

q  velocity  vector 

R  residual  vector 

S  speed-up 

T  temperature 

t  execution  time 

u  velocity  in  x-direction 

V  volume 

V  velocity  in  y-direction 

W  vector  of  conservative  variables 

w  velocity  in  z-direction 

V  ratio  of  specific  heats 


viscosity 

density 

normal  stress  components 
shear  stress  components 
components  of  the  energy  dissipation 
function 


Indices 


alg  algorithmic  ideal 

ijk  discrete  point 

1  laminar 

t  turbulent 

x  in  x-direction 

y  in  y-direction 

z  in  z-direction 

°°  at  infinity 

1.  INTRODUCTION 

When  looking  on  the  progress  made  in  CFD  during  the 
last  decade,  one  observes  that  improvements  are  made 
in  two  directions:  The  algorithms  became  more  flexible 
and  faster,  e.  g.  by  multigrid  techniques,  and  the  hard¬ 
ware  platforms  increased  in  main  memory  and  CPU  per¬ 
formance.  As  far  as  the  progress  in  computer  power  is 
concerned,  experts  predict  that  only  parallel  architec¬ 
tures  will  allow  further  improvements  leading  to  peak 
performances  of  about  1  TFLOP/s  [1,2]. 

Therefore,  since  this  type  of  super  computers  might  re¬ 
quire  a  new  type  of  application  software,  the  develop¬ 
ment  of  parallel  flow  solvers  is  mandatory,  if  one  wants 
to  exploit  their  abilities  in  the  future.  This  could  be 
treated  as  an  isolated  subject,  when  dealing  with  ques¬ 
tions  of  basic  research  interest,  but  when  concerning 
large  codes  in  industrial  use,  several  constraints  are  lim¬ 
iting  the  development. 

First  of  all  the  effort  spent  for  parallelization  must  be 
justified  by  the  gain  in  compute  power  or  the  reduction 
of  computing  costs,  respectively.  Secondly,  large  CFD 
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solvers  usually  have  been  developed  throughout  a  long 
period  involving  a  number  of  different  scientists,  and 
they  are  applied  by  numerous  users  which  both  must  be 
respected  by  a  parallelization.  Last  but  not  least,  there  is 
not  just  one  parallel  architecture  available  at  the  mo¬ 
ment,  but  the  platforms  differ  in  the  design  of  the  CPUs 
(vector  versus  RISC  processors),  the  memory  organiza¬ 
tion  (shared  versus  distributed  memory)  and  the  com¬ 
munication  systems  (hardware  and  software).  There¬ 
fore,  if  one  wants  to  be  able  to  follow  any  hardware 
development  in  the  future,  one  must  keep  the  parallel¬ 
ization  as  flexible  as  possible. 

The  paper  presented  here  describes  the  portable  parallel¬ 
ization  of  the  FLOWer  code  which  is  currently  carried 
out  within  the  project  POPINDA  (POrtable  Paralleliza¬ 
tion  of  INDustrial  Aerodynamical  applications)  funded 
by  the  German  Ministry  of  Research  (BMBF).  The 
FLOWer  code  is  a  block  structured  CFD  solver  for  com¬ 
plex  flows  in  configuration  aerodynamics.  It  has  directly 
evolved  from  the  DLR-CEVCATS  code  [3]  and  is  de¬ 
veloped  in  close  cooperation  of  the  DLR  with  the  Ger¬ 
man  national  research  center  for  computer  science 
(GMD)  and  the  German  aeronautical  industry  (DASA) 
as  a  multi  purpose  flow  solver. 

After  a  description  of  the  numerical  algorithm  of  this 
large  CFD  code  in  the  next  section,  the  strategy  chosen 
for  its  parallelization  will  be  explained  outlining  the 
ideas  of  how  to  meet  the  requirements  for  large  applica¬ 
tion  programs  in  industrial  use.  The  communications  li¬ 
brary  CLIC-3D  which  solves  the  portable  parallelization 
problem  of  such  codes  is  then  reviewed. 

Benchmark  results  obtained  on  various  platforms  dem¬ 
onstrate  the  portability  of  the  FLOWer  code  and  allow 
an  assessment  of  today's  parallel  platforms.  Further¬ 
more,  computations  of  different  aircraft  configurations 
show  that  such  architectures  have  become  operational 
for  CFD  applications  and  what  effects  on  the  obtainable 
performance  occur.  Finally,  the  computation  of  a  6  mil¬ 
lion  grid  point  test  case  on  129  processors  indicate  the 
future  potential  of  parallel  processing  in  CFD. 

2.  NUMERICAL  METHOD  OF  THE  FLOWer 
CODE 

2.1  Governing  Equations 

The  FLOWer  code  is  solving  the  Euler-  or  Navier- 
Stokes  equations  in  conservative  form  [3,  4]  written  as 

l^jwdV-H  Jp  ndS  =  0  (1) 

V  3V 


— > 
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p  pu  pv  pw  pE 
and  F  is  the  flux  tensor  being  defined  by 
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with  the  abbreviations 
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,^y  =  U0txy  +  Vay  +  WT^^-k^ 
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The  elements  of  the  viscous  stress  tensor  are  determined 
by  Newton's  law  of  skin  friction,  i.  e. 


„  9u  2  ^  ^ 

3v  2  ^  ^ 

-  3w  2  ^  > 


(5) 


Further  simplification  is  obtained  by  applying  a  thin 
shear  layer  approximation  accounting  only  for  gradients 
normal  to  surfaces. 

For  the  non-dimensional  pressure  and  temperature  the 
following  relations  hold 


P 


p(y-i) 


T  = 


P 

P 


(6) 


where  W  denotes  the  vector  of  conservative  variables  ^nd  the  system  is  closed  by  the  relations  for  the  trans¬ 
port  coefficients 
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^  =  (X,  +  M-, 


k  = 


Pr,  Pr, 


(7) 


where  the  laminar  viscosity  |j.,  is  given  by  Sutherlands ‘s 
formula 


M-i  f  T  Y/2  +  1  lOK 

"  If^J  T-f-llOK 


(8) 


In  turbulent  flows  the  eddy  viscosity  |X,  is  computed 
from  the  algebraic  Baldwin-Lomax  model  [5], 

2.2  Discretization  and  Time  Integration 

The  governing  equations  are  discretized  by  the  method 
of  lines  separating  the  space  and  time  coordinates.  After 
the  space  discretization,  a  system  of  ordinary  differen¬ 
tial  equations  in  time  results  involving  each  finite  vol¬ 
ume.  For  any  hexaeder  of  the  structured  grid  one  obtains 
the  equation. 

^jWijk  +  ^  jpijk-ndS  =  0  (9) 

i-ikgv 

The  space  discretization  is  central,  so  that  an  artificial 
dissipation  term  due  to  Jameson  et  al.  [6]  is  added 
damping  high  frequency  oscillations  and  allowing  a  suf¬ 
ficiently  sharp  resolution  of  shock  waves  in  the  flow 
field.  The  resulting  system  of  equations  then  reads 

|-Wijk  +  ;^(^Rijk-DijkJ  =  0  (10) 

with  Rijk  and  Dyk  being  the  vector  of  the  residuals  and 
the  artificial  dissipative  fluxes  respectively. 

The  time  integration  is  carried  out  by  an  explicit,  hybrid 
multi  stage  Runge-Kutta  scheme  which  is  accelerated 
by  the  techniques  of  local  time  stepping,  enthalpy  damp¬ 
ing  (Euler)  and  implicit  residual  smoothing  [7]. 

This  procedure  is  embedded  into  a  powerful  multigrid 
algorithm  [3,  8]  which  allows  standard  single  grid  com¬ 
putations  as  well  as  a  successive  grid  refinement  and 
simple  or  full  multigrid,  respectively.  As  is  illustrated  in 
[3],  where  a  more  detailed  description  can  be  found, 
high  convergence  rates  can  be  obtained,  using  this  tech¬ 
nique. 


the  flow  field  is  split  into  regions  for  each  of  which  the 
generation  of  a  structured  grid  is  possible.  Figure  1  is 
showing  schematically  such  a  grid  topology  around  a 
transport  aircraft.  As  one  can  see,  the  flow  field  is  subdi¬ 
vided  into  four  areas  of  similar  size  around  the  wing 
body.  Three  subdomains  are  covered  by  one  block  each 
(blocks  1  to  3),  whereas  the  fourth  region  is  further  sub¬ 
divided,  due  to  the  presence  of  an  engine  there  (blocks  4 
to  9).  The  engine  is  surrounded  by  a  polar  grid  (blocks  8 
and  9)  which  is  adapted  by  blocks  6  and  7  to  the  general 
0-H  topology  (blocks  3  to  5). 

The  program  then  treats  the  blocks  more  or  less  inde¬ 
pendently  of  each  other  which  can  only  be  done  prop¬ 
erly  by  exchanging  data  of  the  current  solution  at  block 
interfaces  before  each  time  step. 

Therefore,  the  blocks  are  surrounded  by  layers  of 
dummy  cells,  which  at  block  intersections  correspond 
with  the  physical  cells  of  the  neighboring  domain.The 
FLOWer  code  allows  an  overlap  width  of  two  cells  re¬ 
sulting  in  second  order  accuracy  of  the  scheme  at  those 
boundaries.  This  is  necessary,  in  order  to  treat  the  artifi¬ 
cial  dissipation  terms  correctly  which  otherwise  could 
spoil  the  solution  as  shown  in  [9]. 

Currently,  the  FLOWer  code  allows  different  exchange 
strategies  for  the  data  at  block  intersections  varying  in 
effort  and  accuracy  [10]. 


Fig.  1  Schematic  multiblock  decomposition  of  the  flow 
field  around  a  generic  transport  aircraft. 
Decomposition  into  9  blocks  due  to  the  adaption 
of  an  engine  fitted  polar  mesh  to  a  global  0-H 
topology. 

3.  PARALLELIZATION  OF  THE  FLOWer  CODE 


2.3  Block  Structure 

Since  structured  grids  around  complex  geometries  can¬ 
not  be  generated  as  one  logically  rectangular  domain, 
the  FLOWer  code  is  block  structured.  That  means  that 


3.1  Requirements 

When  parallelizing  a  large  CFD  solver  as  the  FLOWer 
code,  the  parallelization  cannot  be  treated  in  isolation, 
but  must  be  integrated  into  the  general  development 
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procedure  [9].  Therefore,  certain  objectives  must  be 
met,  the  most  important  of  which  are  specified  in  the 
following. 

Portability 

The  FLOWer  code  is  developed  by  a  number  of  scien¬ 
tists  working  at  different  locations  on  a  variety  of  com¬ 
puters.  Furthermore,  it  is  applied  by  several  users  run¬ 
ning  the  program  on  other  platforms  than  the 
developers.  Finally,  the  life  time  of  the  program  will 
certainly  exceed  that  one  of  most  of  today's  computers, 
so  that  portability  is  a  major  requirement: 

The  FLOWer  code  must  run  on  any  platform,  it  may  be 
sequential  or  parallel  ! 

Conservation  of  the  development  history 
When  developing  the  parallel  FLOWer  code,  its  algo¬ 
rithm  had  already  reached  a  high  degree  of  maturity  es¬ 
tablished  by  various  scientists  during  a  long  period 
within  the  DLR-CEVCATS  code.  Moreover,  the  users 
had  become  experienced  with  its  handling  and  in  inter¬ 
preting  its  results.  Therefore,  the  parallelization  had  to 
respect  that  development  history: 

The  FLOWer  code  must  not  be  completely  re-written 
due  to  its  parallelization  ! 

Low  parallelization  effort 

Parallelization  is  only  one  means  of  high  performance 
computing  and  should  not  be  done  just  for  its  own  sake. 
The  effort  spent  for  parallelization  must  therefore  be 
justified  by  the  corresponding  gain  in  performance  or  re¬ 
duction  in  computational  costs,  respectively: 

The  parallelization  of  the  FLOWer  code  must  achieve 
the  highest  performance  possible  at  lowest  costs  ! 

3.2  Parallelization  Strategies 

Parallelization  of  a  CFD  solver  means  mapping  of  in¬ 
herent  parallelism  incorporated  in  the  program  to  a  par¬ 
allel  architecture  using  a  communication  model.  As  far 
as  structured  codes  are  concerned,  there  is  parallelism 
on  statement  level  (multiply  /  add),  in  the  data  (loops 
over  all  points  of  a  block)  and  in  the  geometry  (the  dif¬ 
ferent  blocks)  which  can  be  expressed  by  parallelizing 
languages,  e.  g.  HPF  or  C-f-+,  parallelizing  compilers 
(directives  /  autotasking)  or  by  message  passing,  i.  e.  by 
explicitly  sending  and  receiving  data  to  and  from  differ¬ 
ent  processes.  Moreover,  the  parallel  hardware  design 
varies  with  respect  to  the  arrangement  of  CPUs,  mem¬ 
ory  and  the  interconnecting  network  (shared  /  distrib¬ 
uted  memory,  hybrid  constructions)  [9]. 

Therefore,  one  has  to  decide  which  type  of  parallelism 
should  be  exploited  using  which  communication  model. 


and  how  to  achieve  portability.  When  parallelizing  the 
FLOWer  code,  general  considerations  led  to  the  follow¬ 
ing  guidelines  allowing  to  meet  the  requirements  stated 
above: 

Grid  partitioning  as  parallelization  strategy 

The  idea  is  to  map  the  different  blocks  to  different  pro¬ 
cesses  where  they  are  solved  separately.  Between  the  it¬ 
eration  steps  the  boundary  data  are  exchanged  via  the 
network. 

This  technique  is  not  only  said  to  be  efficient  when  solv¬ 
ing  partial  differential  equations  [11,  12],  but  moreover 
guarantees  the  conservation  of  the  sequential  develop¬ 
ment  history,  because  it  is  directly  based  on  the  sequen¬ 
tially  well  established  multi  block  method. 

Separation  of  computation  and  communication 
A  strict  application  of  this  rule  allows  an  algorithmic  de¬ 
velopment  which  remains  independent  from  the  paral¬ 
lelization  or  other  hardware  aspects.  Additionally,  the 
code  structure  can  be  kept  modular  more  easily  which  is 
highly  desired  from  software  engineering  reasons.  Fi¬ 
nally,  the  portability  problem  becomes  much  easier  to 
handle,  when  concentrating  the  communication  parts 
within  separate  units. 

Communication  by  message  passing 
Besides  efficiency  arguments,  the  decision  for  the  mes¬ 
sage  passing  communication  model  results  mainly  from 
the  portability  demand.  Using  a  parallelizing  language 
would  have  caused  a  complete  re-implementation  of  the 
FLOWer  code  which  was  clearly  unacceptable,  and  par¬ 
allelizing  compilers  are  only  available  on  some  plat¬ 
forms  restricting  the  portability  of  the  code. 

In  the  contrary,  it  should  be  noted  that  the  message  pass¬ 
ing  approach  does  not  exclude  a  parallelization  by  auto¬ 
tasking  supported  by  compiler  directives  [9]. 

Use  of  a  portable  communications  library 
Combining  the  requirements  for  the  parallelization  of 
the  FLOWer  code  with  the  above  guidelines  leads  to  the 
demand  for  a  portable  communications  library 
Such  a  high  level  library  should  perform  all  typical  op¬ 
erations  necessary  in  parallel  mode  involving  communi¬ 
cation  between  the  different  processes.  Its  usage  should 
therefore  guarantee  parallel  portability,  keep  the  sequen¬ 
tial  code  almost  unchanged  and  reduce  the  paralleliza¬ 
tion  effort  drastically.  Moreover,  it  should  lead  to  a 
highly  reliable  parallelization. 

This  library  has  been  developed  by  the  GMD  as  CLIC- 
3D  and  will  be  described  in  the  following  section. 
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4.  THE  CLIC-3D  COMMUNICATIONS  LIBRARY 
4.1  Background 

The  communications  library  CLIC-3D  (Communica¬ 
tions  Library  for  Industrial  Codes  in  3  Dimensions)  is 
currently  developed  by  the  GMD  within  the  German  re¬ 
search  project  POPINDA.  It  is  based  on  the  former 
GMD  Comlib  library  and  supports  general  block  struc¬ 
tured  PDE  solvers,  particularly  involving  multigrid  al¬ 
gorithms.  Its  development  was  based  on  the  observation 
that  for  this  class  of  programs  the  communication  pat¬ 
terns  are  generally  quite  similar,  although  the  numerical 
algorithms  might  differ  considerably. 

The  major  aim  of  the  CLIC  development  is,  to  make 
programming  for  complex  geometries  as  easy  as  for 
simple  single  block  domains  providing  high  level  rou¬ 
tines  for  all  communication  and  mapping  tasks.  The 
CLIC  user  interface,  therefore,  provides  the  application 
program  with  all  necessary  data  on  the  problem  to  be 
solved. 

Currently,  the  CLIC  library  supports  cell  vertex  and  cell 
centered  discretizations. 

The  portability  of  the  CLIC  library  is  achieved  using  the 
PARMACS  as  portable  message  passing  interface  [13]. 
This  system  was  chosen,  because  it  is  a  commercial 
product  and  not  public  domain  as  PVM  [14],  and  MPI 
[  1 5]  was  not  yet  available  at  the  time,  when  the  POP¬ 
INDA  project  started.  The  corresponding  software  lay¬ 
ers  of  the  parallelized  FLOWer  code  are  illustrated  in 
figure  2. 


Fig.  2  Software  layers  of  the  parallel  FLOWer  code 
4.2  General  Code  Structure 

Since  the  CLIC  library  is  based  on  the  PARMACS  mes¬ 
sage  passing  system,  it  is  designed  for  a  host-node  (mas¬ 
ter-slave)  programming  model.  The  host  process  starts 
the  distributed  application  on  several  nodes,  performs 
the  input  and  output  and  transfers  data  to  and  from  the 


node  processes.  The  host  itself  does  not  participate  in 
the  solution  process  which  is  exclusively  carried  out  by 
the  node  processes.  Consequently  the  user  application 
program  is  seperated  into  a  host  and  a  node  program  as 
shown  in  figure  3. 


Fig.  3  Host-node  structure  of  the  parallel  FLOWer 
code 

The  host  program  reads  in  the  same  input  parameters  as 
the  sequential  user  program.  Then,  CLIC  routines  read 
in  the  description  of  the  block  structured  grid,  create  the 
node  processes  and  map  the  blocks  onto  the  allocated 
node  processors  respecting  load  balance  aspects  as  far 
as  possible.  Then,  the  input  parameters  are  distributed  to 
the  node  processes.  Finally,  another  routine  reads  in  the 
grid  coordinates  and  sends  them  to  the  corresponding 
node  processes  only.  After  reading  and  distributing  all 
data  to  the  nodes,  the  host  process  waits  for  output  gen¬ 
erated  by  the  node  processes  and  writes  it  to  the  desired 
units. 

Eaeh  node  process  executes  an  identical  node  program 
which  may  contain  the  complete  sequential  code.  In 
case  of  the  FLOWer  code,  the  only  differences  in  paral¬ 
lel  mode  are: 

•  The  input  data  is  not  read  in  but  received  from  the 
host  process 

•  Global  operations  involving  all  blocks  are  passed  to 
the  CLIC  library  for  performation 

•  The  data  exchange  at  block  boundaries  is  carried  out 
fully  automatically  by  the  CLIC  library 

•  Write  statements  are  replaced  by  parallel  output  rou¬ 
tines  of  the  CLIC  library 

A  schematic  flow  chart  of  the  parallel  FLOWer  code  is 
given  in  figure  4. 

Further  activities  of  the  CLIC  library  consist  in  the  anal¬ 
ysis  of  the  given  block  structure,  in  order  to  allow  a  spe¬ 
cial  treatment  of  grid  singularities.  For  each  segment 


4-6 


edge  and  point  the  adjoining  blocks  and  the  number  of 
adjoining  cells  is  determined  leading  to  a  topological 
classification.  If  the  segment  is  part  of  the  physical 
boundary,  the  boundary  conditions  of  all  adjoining 
blocks  are  determined,  additionally.  Finally,  geometrical 
singularities  are  detected,  so  that  the  user  can  inquire  all 
data  for  a  special  treatment  of  irregular  grid  points. 


are  received  in  the  order  they  come  in,  and  the  buffers 
are  unpacked.  If  necessary,  the  procedure  is  repeated  for 
segment  edges  and  corner  points,  so  that  finally  all  block 
interfaces  are  updated  correctly. 

Exchange  of  Segment  Data 

— ►  control  stream 
— data  stream 


HOST  NODEl  NODE  2 


Fig.  4  Schematic  flow  chart  of  the  parallel  FLOWer 
code  based  on  the  CLIC  library 

The  same  data  is  needed  by  the  CLIC  library  for  optimi¬ 
zation  of  the  data  exchange  at  block  boundaries.  The 
aim  is,  to  send  the  minimum  number  of  messages  neces¬ 
sary  for  a  correct  update  of  the  boundaries.  This  is  im¬ 
portant  especially  on  coarse  grids  of  multigrid  algo¬ 
rithms  where  the  communication  may  become 
significantly  time  consuming.  Basic  idea  is  the  introduc¬ 
tion  of  a  global  orientation  for  larger  portions  of  the 
block  structure  leading  to  a  fast  exchange  procedure. 
Only  in  topologically  more  complicated  situations  addi¬ 
tional  messages  must  be  sent. 

Another  specialty  of  the  CLIC  library  is  the  possibility 
of  parallel  output,  i.  e.  output  files  can  be  directly  writ¬ 
ten  by  the  node  processes. 


process  1  process  2 


Fig.  5  Schematic  flow  chart  for  a  parallel  data 
exchange. 

Global  operation 

Global  operations  involving  all  blocks  of  a  given  block 
structure  are  necessary,  e.  g.  for  the  computation  of  a 
global  residual.  They  are  carried  out  within  another  spe¬ 
cial  CLIC  routine  using  an  embedded  binary  tree  for  the 
process  topology.  As  illustrated  by  figure  6,  each  parent 
process  receives  data  from  its  child  processes,  performs 
a  local  operation  on  this  data  and  communicates  it  to  its 
own  parent  process.  Afterwards,  the  process  waits  for  a 
message  from  its  parent  process  containing  the  correct 
global  value  which  is  obtained  at  the  end  of  the  chain. 
After  its  reception,  this  value  is  further  communicated  to 
the  corresponding  child  processes. 


4.3  Examples  of  High  Level  CLIC  Operations 

Exchange  of  boundary  data 

As  already  mentioned,  the  grid  partitioning  strategy  re¬ 
quires  an  exchange  of  boundary  data  at  the  interfaces  of 
the  blocks.  Therefore,  the  information  on  the  topology 
of  the  block  structure  is  stored  in  terms  of  block  surface 
segments  in  a  file  that  is  read  in  by  the  CLIC  library. 
During  the  initialization  phase,  this  information  is  ana¬ 
lyzed  with  respect  to  the  necessary  send  and  receive  op¬ 
erations  within  a  data  exchange  procedure. 

When  the  corresponding  exchange  routine  is  called  on 
each  process,  as  sketched  in  figure  5,  all  interface  data 
of  the  blocks  on  a  process  is  stored  segmentwise  in  a  re¬ 
spective  buffer  which  is  sent  (asynchroneously  block¬ 
ing)  to  the  corresponding  neighboring  block  on  another 
process.  Afterwards,  messages  of  the  other  processes 
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Fig.  6  Schematic  flow  chart  for  a  global  operation. 
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5.  RESULTS 

As  a  first  result  it  should  be  noted  that  the  parallel 
FLOWer  code  using  the  CLIC  library  meets  all  of  the  re¬ 
quirements  stated  above: 

•  The  FLOWer  code  is  fully  portable,  in  sequential  as 
well  as  in  parallel  mode. 

•  The  effort  spent  for  the  development  of  the  sequen¬ 
tial  FLOWer  code  and  its  predecessor  CEVCATS 
was  fully  conserved. 

•  The  effort  needed  for  the  parallelization  was 
extremely  low 

The  results  obtained  with  this  code  are  given  in  the  fol¬ 
lowing. 

5.1  Performance  Measurements 

Since  parallelization  is  a  means  of  increasing  the  com¬ 
pute  power  for  CFD  applications,  performance  measure¬ 
ments  were  carried  out  on  several  platforms.  With  this 
not  only  the  portability  of  the  FLOWer  code  is  demon¬ 
strated,  but  an  assessment  of  different  architectures  is 
possible. 

As  test  case  the  flow  around  a  non-swept  wing  consist¬ 
ing  of  NACA  0012  airfoils  was  computed  at  M  =  0.6 
and  a  =  0°  (figure  7).  Two  different  grids  with  40000 
and  320000  cells  were  used,  respectively,  that  were  sub¬ 
divided  into  1 ,  4  and  8  blocks  in  the  small  case  and  into 
1,  4,  8,  16  and  32  blocks  in  the  large  case.  Each  block 
was  of  equal  size  and  was  mapped  to  one  CPU  on  the 
parallel  machines  leading  to  an  ideal  load  balance. 


Fig.  7  NACA  0012  wing  test  case  for  performance 
measurements. 

Figures  8  and  9  show  the  obtained  computing  times  on 
various  parallel  and  vector  machines  with  respect  to  the 
time  needed  on  a  Cray  C90  single  processor.  As  can  be 
seen,  the  single  processor  performance  of  the  NEC  SX-3 
is  hard  to  beat,  even  by  parallel  vector  computers  using 
up  to  8  CPUs.  On  the  other  hand,  the  results  show  that 
parallel  RISC  processor  architectures,  as  the  IBM  SP2 
or  the  NEC  Cenju-3,  are  able  to  compete  with  or  even  to 
outperform  the  Cray  C90  single  processor  using  a  mod¬ 


erate  number  of  32  CPUs.  The  CM-5  and  the  Intel  Para¬ 
gon  showed  to  have  weaker  single  processors,  so  that 
they  need  many  more  CPUs  in  order  to  reach  the  perfor¬ 
mance  of  the  other  machines. 


Fig.  8  Relative  execution  times  on  single  processor 
vector  computers 


8/32P10C.  4/.32P10C.  8/32Proc.  8/!6Pioc.  8/8Ptoc. 

Fig.  9  Relative  execution  times  on  parallel  computers 

5.2  Speed-up  for  Aircraft  Configurations 

For  evaluating  the  potential  of  the  parallelization  of  the 
FLOWer  code,  speed-up  measurements  were  carried  out 
for  a  more  realistic  configuration.  The  inviscid  flow 
around  the  generic  DLR-F4  wing-body  combination 
shown  in  figure  10  was  computed  on  a  grid  consisting  of 
approximately  410000  cells  which  was  subdivided  into 
1,  4  and  8  equally  sized  blocks,  respectively.  For  the 
conditions  of  Mach  number  M  =  0.75  and  incidence  a  = 
0°,  35  W  cycles  involving  4  multigrid  levels  were  per¬ 
formed  on  an  IBM  SPl  computer  . 

The  results  obtained  for  different  communication  sys¬ 
tems  available  there  are  plotted  as  speed-up  versus  pro¬ 
cessor  number  in  figure  1 1 .  As  can  be  seen,  PVM  using 
an  Ethernet  connection  restricts  the  processor  number  to 
be  employed  to  only  four  indicating  that  workstation 
clusters  based  on  the  Ethernet  are  not  suitable  for  paral¬ 
lel  computations  with  the  FLOWer  code.  The  result  can 
be  markably  improved,  when  replacing  the  Ethernet  by 
the  IBM  high  performance  switch,  but  still  the  fastest 
runs  were  obtained  using  the  IBM  MPL/p  communica¬ 
tions  system. 
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Fig.  10  Iso-Mach  contours  and  block  structure  of  the 
DLR-F4  wing-body  combination  (M  =  0.75, 
a  =  0°). 


Fig.  11  Speed-up  versus  processor  number  for  the 
DLR-F4  wing-body  combination  on  IBM  SP1 . 

What  can  be  observed,  is  that  even  with  the  most  power¬ 
ful  communciations  systems  on  the  IBM  SPl,  the  accel¬ 
eration  obtained  is  considerably  deviating  from  the  lin¬ 
ear  speed-up.  This  effect  is  caused  by  an  increase  of 
operations  due  to  the  multiple  computations  of  points  at 
block  interfaces. 

Therefore,  there  is  an  upper  limit  for  the  maximum  ac¬ 
celeration  below  the  linear  speed-up  the  which  can  be 
obtained  from  single  processor  computations  of  the 
multi  block  cases.  This  value  is  called  algorithmic  ideal 
speed-up  an  is  defined  as  the  ratio  of  computing  times 
for  the  one  block  case  and  the  multi  block  case  multi¬ 
plied  with  the  number  of  processors  that  could  be  em¬ 
ployed,  i.  e.  the  number  of  blocks: 


The  algorithmic  ideal  speed-up  is  also  plotted  in  figure 
1 1 ,  and  as  can  be  seen,  is  reached  to  a  degree  of  approx¬ 
imately  95%  using  the  MPL/p  system. 

Of  course,  the  decrease  of  the  maximum  obtainable 
speed-up  reported  above  is  not  satisfactory,  but  on  the 
other  hand  it  is  questionable,  whether  its  value  is  mean¬ 
ingful  for  complex  CFD  problems  at  all.  First  of  all,  for 
speed-up  measurements  one  would  need  problems  that 
are  small  enough  to  be  computed  on  a  single  CPU,  in  or¬ 
der  to  get  a  reference  value.  Secondly,  the  decrease  in 
the  maximum  speed-up  is  only  felt,  because  it  was  pos¬ 
sible  to  compute  a  single  block  solution  for  the  DLR-F4 
wing-body  combination.  On  the  other  hand  the  multi 
block  cases  were  ideally  load  balanced,  because  all 
blocks  were  of  equal  size. 

When  dealing  with  more  complex  configurations,  this 
will  certainly  not  be  the  case.  In  such  situations  there 
will  be  several  blocks  from  grid  generation  reasons 
which  cannot  be  guaranteed  to  have  all  the  same  number 
of  points.  Therefore,  more  complicated  test  cases  must 
be  studied. 

5.3  Parallel  Computation  of  a  Generic  Aircraft 

As  a  more  realistic  configuration,  the  DLR-ALVAST  ge¬ 
neric  aircraft  model  carrying  a  high  bypass  engine  [17] 
was  computed  at  a  Mach  number  of  M  =  0.75  and  an  in¬ 
cidence  of  a  =  1.0°.  The  grid  for  this  test  case  consists 
of  about  575000  cells  and  is  subdivided  into  11  blocks 
the  size  of  which  is  varying  between  4096  and  87552 
cells.  This  is  a  typical  situation  where  neither  load  bal¬ 
ancing  nor  single  block  computations  are  possible,  both 
due  to  grid  generation  reasons.  The  configuration  is 
shown  in  figure  12. 


Fig.  12  ALVAST  generic  aircraft  configuration.  Iso-Mach 
lines  at  M  =  0.75  and  a  =  0°. 
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In  order  to  study  the  corresponding  effects  on  the  paral¬ 
lel  performance,  50  W-cycles  were  performed  mapping 
the  11  blocks  to  1,  7  ,  8  and  10  processors  of  an  IBM 
SP2  respectively.  The  single  processor  result  was  ob¬ 
tained  on  a  slightly  more  powerful  wide  node,  whereas 
the  parallel  runs  were  obtained  on  weaker  thin  nodes. 

As  can  be  seen  from  figure  13,  on  8  processors  a  speed¬ 
up  of  6.6  can  be  gained,  but  a  further  increase  does  not 
lead  to  an  improvement  any  more.  This  behavior  is  ex¬ 
actly  what  must  be  expected  looking  on  the  block  struc¬ 
ture  and  the  mapping  strategy  of  the  CLIC  library. 

The  work  load  per  processor  is  determined  by  the  num¬ 
ber  of  grid  points  to  be  solved,  and  the  largest  number  of 
points  on  any  processor  constitutes  the  total  execution 
time  of  the  parallel  run.  When  mapping  the  11  blocks  to 
less  than  1 1  processors,  there  will  always  be  more  than 
one  block  per  CPU.  Therefore,  the  CLIC  library  applies 
a  mapping  strategy  that  tries  to  distribute  the  blocks  ac¬ 
cording  to  their  size,  so  that  the  work  load  on  the  nodes 
is  as  equal  as  possible. 

Up  to  8  processors  one  is  able  to  continously  reduce  the 
maximum  grid  size  per  CPU  by  simply  mapping  the 
largest  block  of  the  heaviest  loaded  node  to  an  addi¬ 
tional  processor.  But  when  employing  8  nodes,  the  max¬ 
imum  work  load  is  determined  by  the  absolutely  largest 
block  which  of  course  cannot  be  reduced  any  further  by 
mapping  the  block  structure  to  more  CPUs.  Therefore, 
the  minimum  computing  time  or  maximum  speed-up, 
respectively,  is  to  be  obtained  on  8  nodes  and  remains 
constant  afterwards,  as  illustrated  by  figure  13. 

Any  further  increase  of  the  speed-up  would  require  an 
additional  blocking  of  the  largest  block  which  is 
planned  to  be  automatically  supported  by  the  CLIC  li¬ 
brary  in  the  future. 


Fig.  13  Speed-up  versus  processor  number  for  ALVAST 
generic  aircraft  configuration  on  IBM  SP2. 


5.4  Feasibility  Study  for  Large  Problems 

Since  parallelization  is  believed  to  be  the  appropriate 
method  of  tackling  the  future  grand  challenge  problems 
in  design  aerodynamics,  attempts  must  be  made,  in  or¬ 
der  to  demonstrate  the  feasibility  of  this  approach. 
Therefore,  the  viscous  flow  field  around  the  DLR-F4 
wing-body  combination  (compare  figure  10)  was  com¬ 
puted  on  a  grid  generated  by  the  Deutsche  Airbus  com¬ 
pany  consisting  of  6.6  million  grid  points  subdivided 
into  128  blocks  of  equal  size.  800  multi  grid  cycles  were 
performed  on  a  129  processor  IBM  SP2  (1  host  -r  128 
nodes)  which  took  less  than  three  hours  of  response  time 
(13  seconds  per  cycle).  The  convergence  of  the  compu¬ 
tation  is  given  in  figure  14  in  terms  of  the  logarithmic 
density  residual  versus  the  number  of  multigrid  cycles. 

DLR-F4  wing-body  combination 


100  200  300  400  500  600  700  800  N 


Fig.  14  Density  residual  versus  number  of  multigrid 
cycles.  DLR-F4  wing-body  combination  (6.6 
million  cells)  on  129  processors  of  an  IBM  SP2. 

A  grid  convergence  study  was  carried  out  by  repeating 
the  computation  on  four  grids  each  differing  in  the  num¬ 
ber  of  total  points  by  a  factor  of  8.  The  result  is  given  in 
figure  1 5  in  terms  of  the  total  lift  coefficient  versus  the 
scaled  grid  size.  As  one  can  see  from  an  extrapolation  of 
the  development  of  the  lift  between  the  levels  three  and 
one,  the  large  grid  size  of  6.6  million  cells  is  necessary, 
in  order  to  get  the  lift  within  an  accuracy  of  one  percent. 
Since  the  quality  of  the  solution  was  spoiled  by  regions 
of  highly  distorted  cells,  a  repetion  of  the  study  is 
planned  with  an  improved  grid. 

Nevertheless,  what  is  proven,  is  that  such  large  scale 
problems  can  be  treated  with  the  parallel  FLOWer  code 
and  that  today's  parallel  hardware  is  already  allowing 
such  computations.  Therefore,  this  study  is  a  promising 
demonstration  of  the  potential  of  parallel  processing  in 
CFD  heading  towards  the  solution  of  the  aerodynamic 
grand  challenges  expected  in  the  future. 
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Fig.  1 5  Total  lift  coefficient  versus  scaled  number  of  grid 
points.  DLR-F4  wing  body  combination 
computed  on  129  processors  of  an  IBM  SP2. 

6.  CONCLUSIONS 

This  paper  shows,  how  the  computational  power  of  par¬ 
allel  architectures  is  exploited  by  the  three-dimensional 
structured  solver  for  complex  flows  FLOWer  under  the 
restricting  demands  for  portability,  conservation  of  the 
former  development  history  and  minimization  of  the 
parallelization  effort. 

The  basic  considerations  to  use  the  grid  partitioning  ap¬ 
proach  as  parallelization  strategy  and  to  strictly  separate 
communication  and  computation  lead  to  the  implemen¬ 
tation  of  the  message  passing  based  portable  CLIC-3D 
communications  library  supporting  any  high  level  oper¬ 
ation  occuring  in  typical  partial  differential  equation 
solvers  on  structured  meshes.  With  this  library  the  paral¬ 
lelization  meets  all  general  requirements  for  the  devel¬ 
opment  of  large  codes  in  industrial  use. 

Performance  measurements  on  a  large  variety  of  com¬ 
puters  of  different  architecture  demonstrate  the  compre¬ 
hensive  portability  of  the  CLIC  based  FLOWer  code  and 
allow  an  assessment  of  today's  hardware  capability  in 
CFD.  Still  the  NEC  SX-3  vector  computer  appeared  to 
be  the  most  powerful  machine  solving  a  standard  bench¬ 
mark  problem,  but  with  a  moderate  number  of  32  RISC 
processors  the  IBM  SP2  already  outperforms  a  Cray 
C90  single  processor,  and  a  32  processor  NEC  Cenju-3 
is  at  least  competitive. 

Speed-up  studies  for  a  typical  wing-body  combination 
show  that  the  communication  system  has  a  decisive  in¬ 
fluence  on  the  achievable  overall  acceleration.  It  turns 
out,  that  Ethernet  based  workstation  clusters  communi¬ 
cating  via  PVM  are  not  suitable  to  replace  true  parallel 
computers  as  far  as  performance  is  concerned. 
Additionally,  the  maximum  speed-up  to  be  obtained  is 


algorithmically  limited  by  the  grid  partitioning  strategy, 
because  points  at  block  interfaces  are  multiply  com¬ 
puted  increasing  the  total  number  of  operations.  But  this 
drawback  is  only  felt  for  simple  problems,  where  a 
speed-up  can  still  be  measured  and  which  are,  therefore, 
far  away  from  being  a  grand  challenge. 

Parallel  computations  of  a  generic  aircraft  consisting  of 
a  wing-body  combination  carrying  a  pylon  with  an  en¬ 
gine  demonstrate,  that  the  complexity  of  today's  prob¬ 
lems  in  conflguration  aerodynamics  can  be  tackled  on  a 
parallel  computer.  Speed-up  measurements  with  respect 
to  a  multiblock  single  processor  computation  give  sat- 
isfatory  results,  but  also  reveal  the  necessity  for  an  auto¬ 
matic  load  balancing  tool  that  allows  to  map  an  initial 
block  structure  to  a  higher  number  of  processors  than 
given  blocks. 

Finally,  a  Navier-Stokes  computation  of  the  flow  field 
around  a  wing-body  combination  on  a  grid  consisting  of 
6.6  million  points  on  a  129  processor  IBM  SP2  outlines 
the  potential  of  parallel  processing  in  CFD  for  the  fu¬ 
ture.  It  proves  that  high  numbers  of  processors  can  be 
successfully  handled  in  numerical  aerodynamics  and 
that  parallelization,  indeed,  is  a  promising  means  for 
solving  the  grand  challenge  problems. 
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Abstract 


In  the  present  paper  we  introduce  and  discuss  zin  efficient  par¬ 
allel  algorithm  for  the  spectral  multi-domain  solution  of  the 
incompressible  Navier-Stokes  equations.  Firts,  the  algorithm 
is  given  in  its  basic  form  for  the  2-dimensioned  Ccise  and,  later 
on,  a  possible  extension  to  3-dimensional  flows  exhibiting  a 
homogeneous  (periodic)  direction  is  proposed.  The  algorithm 
is  validated  both  for  its  peircJlel  performcinces,  and  its  accu¬ 
racy. 


1  INTRODUCTION 


In  the  last  years  domain  decomposition  methods  have  gained 
much  attention  in  the  CFD  comunity.  One  of  the  most  rel¬ 
evant  features  of  such  methods  is  concerned  with  the  possi¬ 
bility  of  tuning  the  accuracy  of  the  numerical  discretization 
according  to  the  expected  behaviour  of  the  solution  in  each 
subdomain.  Consequently,  subregions  of  flow  field  contmn- 
ing  sharp  boundary  layer,  can  be  enclosed  within  subdomsiins 
with  high  resolution,  while  low  resolution  can  be  tissigned  to 
subregions  where  smooth  solutions  csin  be  expected. 

These  advantages  cein  be  fully  exploited  when  discretizing  the 
equations  with  spectral  methods  which  gusircintee  a  fast  decay 
of  the  error  with  the  number  of  the  nodes,  termed  as  “spectral 
accuracy” . 

On  the  other  h^u^d  domain  decomposition  methods  might  pro¬ 
vide  a  natural  stabilization  strategy  for  the  spectral  discretiza¬ 
tion  which  is  a  “central  one”  in  nature.  In  fact  the  local  cell 
Peclet  number  can  be  locally  diminished  by  reducing  the  mesh 
spacing  within  the  critical  subdomain,  without  the  introduc¬ 
tion  of  any  particular  stabilization  procedure. 

From  the  computational  point  of  view,  the  domain  decompo¬ 
sition  techniques  is  well  suited  for  peirallel  computing,  even 
if  in  practical  case  several  difficulties  cirises  whenever  good 
performances  have  to  be  reached  [1]. 

In  the  first  part  of  the  present  paper,  a  pcireillel  eJgorithm  for 
the  solution  of  the  bidimensional  incompressible  Navier-Stokes 
equations  is  presented.  After  a  brief  introduction  of  the  time 
splitting  scheme  used  for  the  time  discretization  of  the  un¬ 
steady  incompressible  Navier-Stokes  equations,  the  attention 
will  be  mainly  focused  on  the  the  spectral  multidomain  ap¬ 
proach  and  on  its  parallel  features.  Performance  results  con¬ 
cerning  the  parallel  implementation  on  two  different  MIMD 
parallel  architectures  will  be  presented.  The  second  part  of 
the  paper  is  concerned  with  the  application  of  the  algorithm 
to  three  dimensional  unsteady  problems. 


2  NAVIER-STOKES  EQUATIONS  AND 
TIME  SPLITTING  SCHEME 

When  the  incompressible  Navier-Stokes  equations 

^  +  i  ({/ .  Vt/ +  V  •  (1/1/))  =  -Vp-b^Al/  (1) 

V  -1/  =  0  (2) 


are  solved  by  meeins  of  a  projection  method  [2],  with  the  diffu¬ 
sive  terms  treated  in  an  implicit  fcishion  [3],  the  time  stepping 
procedure  consists  in  a  Cciscade  of  scedar  elliptic  kernels,  to  be 
solved  at  each  time  step.  Namely  two  (for  the  two-dimensional 
equations)  Helmohltz  problems  for  the  inversion  of  the  diffu¬ 
sive  pcirt,  suid  a  Poisson  problem  for  the  pressure  need  to  be 
solved  at  each  time  step.  It  is  then  clear  that,  in  order  to 
achieve  a  globally  efficient  algorithm,  it  is  of  fundamental  im- 
portcince  to  tackle  effectively  the  mentioned  scalar  problems. 
For  the  sake  of  completeness  in  the  following  the  adopted  frac¬ 
tional  step  scheme  (i.e.  Van  Kan’s  pressure  correction  method 
[4])is  given 


(3) 
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In  the  first  step,  a  non  physical  intermediate  velocity  field  U 
is  computed.  In  fact,  0  does  not  satisfy  the  incompressility 
condition.  Then  in  the  second  step  U  is  projected  onto  the 
divergence  free  space  to  get  an  adeguate  velocity  approssima- 
tion  of  {/"■•■' . 

The  scheme  with  the  given  boundary  conditions  is  nothing  else 
then  a  second  order  Crcink-Nicolson  Adams-Bashforth  scheme 
with  an  O  (At^)  deviation  in  the  tmgent  direction  of  the 
boimdary.  By  applying  the  divergence  operator  to  (6),  it  turns 
out  that  the  latter  is  equivalent  to 


A(p”+‘+p")  =|^V-17 


dn 


|sn  =  0 


[/"+'=  U  -  -yV  (p"+'  -  p") 


(7) 

(8) 
(9) 


In  the  next  section  the  attention  will  be  focused  on  the  way 
each  scalar  elliptic  problem  hcis  been  tackled  in  the  framework 
of  a  spectral  multidomain  discretization. 
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3  SPACE  DISCRETIZATION 

In  the  present  work,  a  Legendre  spectral  collocation  technique 
coupled  with  a  domain  decomposition  method  has  been  used 
for  the  space  discretization  of  the  differentied  equations.  Ad¬ 
ditional  references  can  be  found  in  ( [6],  [5])  for  the  projection 
decomposition  method,  and  in  (  [7])  for  the  spectreJ  apnroxi- 
mation  method. 

3.1  Elliptic  terms 

The  following  problem,  rappresentative  of  one  of  the  elliptic 
scalar  problems  mentioned  in  the  previous  section,  is  consid¬ 
ered  hereafter: 

—  A  u -h  ou  =  /  inf],  /eL^(fl)  (10) 
u  =  0  on  (11) 

where  a  is  a  real  constant  >  0,  and_where  fijs^cin  open  con- 
nected  set  Q  C  in  particular,  fl  =  Uilifl,  with  12;  is  a 
closed  rectangle  having  either  common  side  or  common  ver¬ 
tex  with  each  neighbour;  a  >  0  is  either  identically  equal  to 
zero  (i.e.,  for  the  Poisson  problem  related  with  the  pressure)  or 
is  equal  to  2/At/?e  (i.e.,  for  one  of  the  momentum  equations), 
and  the  equivalent  weak  formulation  of  (10),  (11)  is: 

find  u  £ //o(0)  such  that 

=  (/>'^)t3(n)  Vu  €  Ho{Q)  (12) 


where  Ho{Q)  is  the  real  Hilbert  space  defined  as  follows: 


//o'(n)  =  {u  £  L^(f2)  : 

I;  ^ 

(13) 

and  ^  €  L^(i2), 
0X2 

u|sn  =  0} 

(14) 

equipped  with  the  scalar  product: 

/(«,  v)  =  j  (Vu  ■  Vv  -4-  auu)dO 

Ja 

Vu,  V  £  Hq  (f2) 

(15) 

Following  the  classical  domain  decomposition  technique  prob¬ 
lem  ( 12)  is  decoupled  into  a  set  of  problems  within  each  sub- 
domain  plus  cin  additioneJ  problem  at  the  interfaces  F : 


where  ker(7)  is  the  kernel  of  operator  7,  and  its  orthogonal 
complement  K'^  is  defined  as: 

=  {u  €  //o(n)  :  l(u,  uo)  =  0  V  VO  €  A'}  (21) 

Therefore,  the  solution  u  €  Ho(Q)  of  problem  (12)  can  be 

uniquely  decomposed  as 

u  =  uo  +  o,  uo  £  A"  and  ti  £  A'’*"  (22) 

Since  the  restriction  70  of  the  operator  7  to  A'"*‘  is  an  isometric 
isomorphism  between  and  it  follows  that 

Vu  €/<:■'  3!  V;  €  Ho‘^"(r)  :  u  =  7o”‘ti  (23) 

Identity  (22)  can  be  reformulated  as: 

u  =  uo -t- 7^ V  with  uo  £  A"  and  tp  ^  (24) 

Thus,  problem  (12)  can  be  easily  proven  to  be  equivalent  to 
the  set  of  the  two  following  ones: 

Problem  (PI):  find  uo  €  A'  such  that: 

2(«o,  t>o)  =  (/,  vo)i,2(fi)  V  Vo  £  A  (25) 

Problem  (P2):  find  tp  €  such  that: 

^(7o'’'l/’.7^‘«)  =  (/.7o'''«)i.=(n)V2  €  Ho^^(r)  (26) 

Problem  Pi  is  nothing  else  than  the  solution  of  N  decoupled 
elliptic  problems  with  homogeneous  Dirichlet  boundary  con¬ 
ditions  on  both  dQ  and  F.  To  build  its  discrete  conterpcirt, 
a  steindard  Legendre  collocation  method  has  been  used  (  [7]). 
To  this  end,  the  imknowns  are  decomposed  into  a  series  of 
Legendre  polynomials: 

Nj:  Ny 

«(^. ^  ^  Uk,tL'‘(x)L\y),  (27) 

*=i  i=i 

where  L*  is  the  k‘^  Legendre  polynomicJ.  Likewise,  the  func¬ 
tion  V  is  decomposed  into  a  series  of  Lagrange  polynomials 
constructed  on  the  Gauss-Lobatto  nodes. 

JVx  Ny 

.(.,»)  =  EE  Vk,iLa'‘{x)La‘{y),  (28) 

k=l  (=1 


F  =  (f2\no)\9f2  with  Qo  =  (16) 

Let  be  the  completion  of  the  normed  vector  space  S 

defined  as: 

S  =  {2  €  C“(F)  :  3<P  €  Co°°(f2)  such  that  2  =  <^r} 

||2||  =  inf  ||<A||H^(n)  (17) 

*6C~(n) 
tpY'  =  z 

where  <pr  is  the  restriction  of  <p  on  F. 

The  linearity  and  continuity  of  the  operator 

4>eCS^{Q)^<Pr  (18) 

into  Lfo^^(F)  and  the  fact  that  Co°(f2)  is  dense  in  Ho{Fl)  leads 
to  the  existence  2uid  uniqueness  of  a  linear  xind  continuous 
operator  7  (trace  operator)  from  Hq{Q)  onto  //q^^(F)  defined 
as 

70  =  0r  V0  £  Ho(0)  (19) 

The  7  operator  tillows  to  identify  two  closed  mutually  orthog¬ 
onal  subspaces 

A'  =  ker(7)  =  {wo  G  ffo  (^)  :  7“o  =  0}  (20) 


where  La*  is  the  Lagrange  polynomial  for  which 
La^(xi)  =  Sk,i.  By  taking  into  account  the  expression  of  u 
eind  V  and  by  replacing  the  scalar  product  2(., .)  by  its  discrete 
counterpart,  the  differential  problem  reads: 

find  Uk,i,  1  <  ^  <  Nx  ,  1  <  2  <  such  that 

,[-(^)fc,i  +  auk.i  -  fk,i\La' {xk)La^ {yi)izJki^i  =  0  Vt,  j 

(29) 

where  Wk  are  the  Gauss  Lobatto  weights  for  the  quadrature. 
Using  the  definition  of  Lagrange  polynomials  [Laj(xi)  =  Pij), 
the  disretized  equations  become: 

find  tik,t,  1  ^  L  <  Nx  ,  1  <  2  <  A^si 

such  that  -f  auk,i  =  /k,i 

j 

An  efficient  procedure  to  solve  the  given  cJgebraic  problem 
will  be  given  in  the  next  section. 

As  concern  problem  P2,  if  {^;}  i  =  1,  ..,00  is  a  set  of  linearly 
independent  functions  which  constitute  a  base  for 
then  the  discrete  version  of  problem  P2  reads  as: 

M 

2(70-' ^a.6,7o'''^j)  =  (/.7o“*f>)L2(n)  Vj  =  l,..,M  (31) 
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Typically  M  corresponds  exactly  to  the  number  of  points  on 
the  interface.  To  set  up  an  eJgebraic  equivalent  of  (31)  the 
operator  7^*  should  be  explicitly  formulated.  In  practice,  the 
operator  7^*  is  never  required  if  an  iterative  procedure  is  in¬ 
troduced  to  solve  problem  P2.  To  illustrate  this  point,  it  must 
be  remarked  that  u*  €  must  satisfy  the  orthogonality  con¬ 
dition: 

/(u\uo)  =  0  Vuo€A'  (32) 

which  corresponds  to  the  solution  of  N  elliptic  problems  (25) 
with  Dirichlet  boimdsiry  conditions:  homogeneous  on  dil  and 
to  be  iteratively  determined  on  F. 

To  provide  at  each  iteration  k  the  condition  on  F  for  problem 
(32)  the  Green’s  formula  is  applied  to  (31) 

Vi=l,..M  (33) 

where  solution  at  iteration  k  of 

problem  (32),  where  ^  represents  the  jump  of  the  normal 
derivatives  on  F.  i?*  is  the  residual  at  iteration  fc,  from  which 
the  updating  of  the  bound2iry  value  can  be  obtained 

within  the  chosen  iterative  procedure. 

The  convergence  rate  of  the  iterative  procedure  strongly  de¬ 
pends  on  the  choice  of  the  beisis  [8].  For  the  present  work 
the  basis  fimctions  proposed  by  Ovtchinnikov  [8]  have  been 
used.  These  constitute  a  nearly  optimed  basis,  in  the  sense 
that  the  condition  number  of  system  (31)  is  bounded  by  a 
constant  independent  of  M,  where  M  is  the  dimension  of  the 
subspace  of  generated  by  span{G}  i  =  1,  ••,  M. 

In  view  of  the  character  of  the  algebraic  problem  (symmetric 
positive  defined)  the  conjugate  gradient  has  been  employed  to 
solve  problem  (31). 


A  final  remark  concerns  the  importance  of  achieving  an  ef¬ 
ficient  technique  to  invert  the  decoupled  Dirichlet  problems 
(Pi).  To  this  end,  we  make  use  of  a  modified  matrix  diagonal- 
ization  approach  [9].  The  Legendre  collocation  approximation 
to  one  of  the  mentioned  sub-problems  migh  be  re-written  as: 

UD  +  D'^U  +  a[U  =  F  (36) 

where  D  is  the  collocated  Lagrange  second  derivative  matrix 
acting  on  the  subdomain  interned  nodes,  U  is  the  unknow 
matrix  ordered  by  rows,  and  F  is  a  modified  right  hand  side 
matrix  keeping  into  account  the  effects  of  the  boundary  values. 
First,  we  determine  the  eigenvalues  of  D,  its  left  and  right 
eigenvector  system  (ordered  by  columns)  and  the  respective 
inverses. 

£7*  D  Fr  =  A  (37) 

Fr‘  D'^  El  =  A  (38) 

Matrices  Fr,  F(,  FFS  diagonal  eigenvalue  matrix 

A  are  computed  cind  stored  in  a  pre-processing  stage.  Indicat¬ 
ing  with  U  =  FF*  U  El  and  with  F  =  Fjr’  F  F;  we  invert 
the  diagonalised  problem; 

AU  +  UA  -  all  =  F  (39) 

and  recover  the  fined  solution  as: 

U  =  ErUE-^  (40) 

Having  solved  the  eigenvalue  problem  in  a  pre-processing 
stage,  the  recursive  solution  cost  turns  out  to  be  order  rj® 
operations,  n  being  the  number  of  nodes  used  to  discretized 
each  direction  within  a  single  subdomain. 


3.2  Solution  procedures  for  multiple  prob¬ 
lems 

When  multiple  solutions  for  cin  elliptic  problem  of  the  form 
(10,  11)  are  required  (i.e.,  within  a  fractional  step  time  ad- 
vcincement),  it  turns  out  to  be  much  more  efficient  to  invert, 
once  for  all  (in  a  pre-processing  stage),  the  abstract  operator 
5  hcuidling  the  interface  unknowns. 

To  introduce  the  method  let  us  reconsider  problem  (10,  11). 
With  reference  to  the  previous  section,  we  reconsider  the  same 
differential  problems:  problem  PI  (25)  eind  the  the  differential 
problem  leading  to  the  solution  on  the  interface  (P2),  here 
given  in  the  following  abstract  form: 

Sak  =  hk  (34) 

Where  the  ak ’s  refers  to  the  Galerkin  coefficients  of  the  solu¬ 
tion  on  the  interface,  and  the  hk ’s  are  the  Galerkin  coefficients 
of  the  jump  of  the  normed  derivatives  produced  by  the  solution 
of  the  N  problems  Pi. 

Let  us  now  consider  the  M  problems  (see  31): 

Sak  =  (35) 

Meaning  problems  with  a  jump  of  the  normal  derivatives  lead¬ 
ing  to  a  unitciry  Galerkin  coefficient  i  and  zero  values  for  cdl 
the  other  coefficients  j,  (j  ^  i).  Succesive  inversions,  through 
the  iterative  procedure  outlined  in  the  previous  section,  cd- 
low  for  constructing  by  columns  the  operator  The  latter 
might,  then,  be  considered  cis  a  capacitance  Galerkin  matrix 
that  applied  to  the  Galerkin  coefficients  of  the  computed  nor¬ 
mal  derivatives  jumps  (problem  Pi)  rele2ise  the  coefficients  of 
the  solution  on  the  interface  to  guarantee  a  zero  weak  normal 
derivative  jumps  between  subdomciins.  It  is  also  remeirked 
that  matrix  is  simmetryc  because  obtained  from  the  dis¬ 
cretization  of  a  self-adjoint  problem  (33).  Of  course  this  is  a 
nice  property  leading  to  an  evident  storage  reduction. 


3.3  Projection  step  treatment 

If  each  single  differential  problem  is  tackled  with  the  algorithm 
described  in  the  previous  section,  at  the  end  of  each  time  step 
the  solution  is  equivalent  to  one,  hypotetically  achieved  by 
solving  the  whole  domain  at  once. 

The  last  statement  requires  some  comments.  When  finite  di¬ 
mensional  approximation  of  the  space  where  the  solution  is 
sought  Eire  considered  numerical  problem  might  cirise  within 
the  fractional  step  edgorithm  in  the  interface  neighbouring  re¬ 
gions.  In  particulcir  when  the  projection  step  (5)  is  considered, 
a  straigh  use  of  the  results  obtained  with  the  present  multido¬ 
main  method  leads  a  discontinuous  value  of  the  divergence 
free  velocity  field  along  the  interfaces. 

From  the  numerical  point  of  view,  these  discontinuities,  even 
if  limited  to  a  set  of  measure  zero  (F)  might  introduce  an 
artificial  “numerical  boundary  layer”  that  the  whole  time  in¬ 
tegration  procedure  cannot  damp  out  and  that  might  lead 
to  catastrophic  instabilities.  To  avoid  such  a  drawback  two 
solutions  are  possible.  The  first  one  relies  upon  increasing 
the  dimension  of  the  approximation  subspaces  to  reduce  the 
jumps  at  the  interface.  The  second  one  consists  in  replacing 
the  gradient  of  the  pressure  in  equation  (5)  with  an  equiv¬ 
alent  function  in  F^(fi),  which  differs  from  the  original  one 
only  along  sets  of  measure  zero.  In  pcirticular,  the  gradient 
Q  =  V((^”'*’*  —  </>")  of  the  solution,  achieved  by  solving  (7) 
with  the  previously  outlined  multi-domain  spectral  method  is 
substituted  with  the  vector  function  Q  defined  as: 


Qiix,y)  = 


Qi(x,y)  V(x,  y)6n\F 

+  S.S’]  V(r;,y)eF,, 

3  3 


(41) 


V  i,j  =  1,2  component. 

where  Fr..  =  Hr  nOi,  a/J  (u/j)  is  the  Gauss- Legendre  quadra¬ 
ture  weight  along  Fra  (either  j  or  i)  corresponding  to  the  node 
(x,  y)  in  the  subdomain  fir  (Gs)  and  Q’’  (Q')  is  the  restriction 
of  Q  in  fir  (fls)  evaluated  in  {x,  y). 
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3.4  Accuracy  tests 

To  test  the  accuracy  of  the  proposed  spectral  multi-domain 
algorithm  we  have  considered  the  clcissiccil  Taylor-Green  ana¬ 
lytical  test  case  for  the  2-dimensional  incompressible  Navier- 
Stokes  equations: 

u(x,y)  =  —cos{nx)  sin(7ry)e“'^^’"  (42) 

v{x,y)  =  sin(n-a:)  cos(7rj/)e”‘^^’^  (43) 

p(x,y)  =■  -1/4  (cos(2?ri:) -f- cos(2)ry))e~'’'  (44) 

on  the  domain  =  [0, 2]  x  [0,  2] .  The  following  set  of  boundary 
conditions  have  been  applied: 

•  on  the  edges  i:  =  0  and  x  =  2  homogeneous  DiriclJet 
conditions  for  u  and  homogeneous  Neumann  conditions 
for  V. 

•  on  the  edges  y  =  0  and  y  =  2  homogeneous  Dirichlet 
conditions  for  v  sind  homogeneous  Neumcinn  conditions 
for  u. 

The  tests  have  concerned  both  time  cind  space  accuracy.  The 
latter  hcis  been  measured  imposing  an  extremely  small  VcJue 
for  the  time  step.  Different  configurations  have  been  consid¬ 
ered  and  the  error  hcis  always  been  measured  according  with 
the  L^(Q)  norm.  The  following  table,  showing  the  results  of 
different  tests  with  different  domain  partitioning  configura¬ 
tions,  summarizes  the  accuracy  measurements  both  for  one  of 
the  velocity  components  eind  for  the  pressure. 


Num. 

Nodes 

Total 

Error 

Error 

doms 

per  dom. 

nodes 

L2  0 

L2  p 

1 

8 

64 

1 

11 

121 

1 

14 

196 

4 

8 

256 

EDEiil 

WTiMM 

4 

11 

484 

■TTffMill 

4 

14 

784 

imfBn 

From  the  given  results,  the  accuracy  of  the  solution  is  quite 
evident.  It  is  remarked  that  the  convergence  for  the  pressure 
is  lower  than  for  the  velocity,  but  nevertheless,  still  spectral. 
In  order  to  measure  the  time  accuracy  of  the  present  scheme, 
we  considered  the  same  test  case  with  a  prescribed  discretiza¬ 
tion  in  space  (4  subdomains  14  x  14  nodes  each)  sufficient  to 
deliver  optimcil  spaticd  acciuacy.  In  the  following  table  we 
present  the  relative  L^(Q)  norm  of  the  velocity  error  achieved 
after  1  time  unit. 


time  step  size 

Relative  L2  x-component  velocity  error 

0.1 

.4  X  10“‘ 

0.01 

.5  X  lO"'^ 

0.001 

.3  X  10“^ 

0.0001 

.5  X  10“' 

From  the  results  it  turns  out  that  the  adopted  scheme  is  sec¬ 
ond  order  in  time,  at  least  for  the  velocity. 


4  PARALLEL  IMPLEMENTATION 

As  concerns  the  peirallel  implementation  of  the  given  algo¬ 
rithm,  we  have  used  a  slightly  modified  version  of  master- 
slave  computationcJ  model.  In  peirticulcir,  the  major  differ¬ 
ence  with  respect  to  the  clcissiccJ  model  is  that  our  master  ac¬ 
tively  cooperates  with  the  staves  during  the  calculation  phcise, 
while  in  the  standard  version,  the  master  is  only  demanded 
to  distribute  initial  data  and  to  gather  the  results.  In  our 
implementation,  the  activities  are  sheired  between  master  and 
slaves  as  follow.  At  the  beginning  of  the  computation,  the 


master  process  calculates  the  guess  values  for  the  Dirichlet 
problems.  These  values  are  then  trasmitted  to  the  slave  pro¬ 
cesses:  each  of  the  slaves  solves  the  Dirichlet  problems  for 
the  assigned  domains;  it  should  be  noted  that,  in  this  case, 
the  dommn  decomposition  (which  allow  the  slaves  to  operate 
in  parallel)  derives  directly  from  the  multi-domain  approach. 
After  this  first  phase,  the  slaves  transmit  the  calculated  values 
at  the  domain  interfaces  to  the  master,  which  calculates  the 
new  values  by  applying  a  Conjugate  Gradient  algorithm,  and 
commxuiicates  these  values  to  the  slaves  for  the  next  iteration. 
The  main  causes  of  inefficiency  in  using  parallel  architectures 
are  an  rmeven  load-balancing  and  the  communication  over¬ 
heads.  In  general,  the  multidomain  technique  can  generate 
load  balancing  problems  because  the  size  sind/or  computa¬ 
tion  of  blocks  can  widely  differ;  however,  in  our  case  each 
domain  has  the  same  number  of  points.  Thus,  if  the  number 
of  domains  is  a  multiple  of  the  number  of  processor,  we  obtain 
an  optimal  load  balancing.  The  communication  overheads  is 
mainly  related  to  the  Conjugate  Gradient  algorithm:  at  each 
time  iteration,  data  need  to  be  exchanged  between  processors 
containing  adjacent  dommn  interfaces  and  the  master  proces¬ 
sor.  Because  of  the  sequemtial  flow  of  these  activities,  it  is 
not  possible  to  overlap  computation  and  communication,  so 
the  time  spent  for  these  communications  can  represent  a  not 
negligible  part  of  the  overall  computing  time. 

The  parallel  version  of  the  code  has  been  developed  for  mes¬ 
sage  passing  environments.  In  pcirticular,  the  code  has  been 
written  in  Fortran  77  plus  PVM  3.3  communication  primi¬ 
tives.  In  order  to  meet  the  goal  of  overlapping  computation 
and  communication,  non-blocking  communication  primitives 
have  been  used.  Note  that  the  parallelism  is  exploited  only 
among  slaves:  the  master  and  the  slaves  cannot  operate  in 
pciraJlel.  Anyway,  as  the  great  part  of  the  computation  is  de¬ 
manded  to  the  slaves,  the  obtained  performances  on  various 
homogeneous  parciUel  systems  are  quite  good. 

4.1  Performance  evaluations 

For  the  tests,  we  have  used  two  different  parcdlel  machines. 
The  first  is  a  CONVEX  C210-MPP0  with  a  vector  processor 
smd  4  scalar  processor  HP  730  connected  via  FDDI.  The  sec¬ 
ond  machine  is  a  MEIKO  CS2  with  18  super-Sparc  processors 
connected  through  a  switching  network.  Both  the  machine 
are  distributed  memory  MIMD  pcirallel  computers.  The  tests 
have  been  performed  by  using  a  number  of  domain  multiple 
of  the  number  of  processors  used,  so  that  load  balancing  is 
guaranteed.  Hence,  the  cause  for  the  loss  of  efficiency  are  the 
time  tc  spent  for  the  communication  and  the  idle  time  tw  of 
the  slaves  waiting  for  the  meister  results.  Note  that,  while  the 
time  tw  is  indepenedent  of  the  number  N  of  processors,  the 
time  tc  increases  according  to  N:  so,  for  a  given  problem,  a 
lineeir  decreaise  of  the  efficiency  is  expected. 

In  figg.  (1-4)  the  results  obtcdned  on  the  Meiko  machine  are 
shown.  Note  that  the  veJues  of  efficiency  are  quite  good,  ex- 
peciaUy  when  the  number  of  points  for  each  domain  increases. 
Moreover,  when  the  number  of  processor  grows  the  efficiency 
linearly  decrease,  as  expected. 

Figures  (5,6)  shows  a  comparison  of  the  results  obtmned  for 
both  the  Meiko  CS2  md  the  Convex  MPPO  machines.  It 
should  be  noted  that  the  Convex  machine  performs  better 
than  Meiko  when  two  processors  are  used;  on  contrary,  by  in- 
creasining  the  number  of  processors  the  performances  of  the 
Meiko  are  better.  This  behaviour  is  essentially  related  to  the 
different  chcu-acteristics  of  the  intercormection  networks;  the 
FDDI  network  of  the  Convex  edlows  very  fsist  communication 
between  two  processors  at  time  (the  opticsJ  fiber  is  a  common 
shared  resource).  On  the  other  heind,  the  CS2  switching  net¬ 
work  cJlows  to  simultaneously  execute  different  communica¬ 
tions,  so  reducing  the  overedl  communication  time  {as  matter 
of  fact,  also  the  presence  of  properly  designed  communication 
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Nproc 

Fig.  2:  12  domains  with  15  x  15  nodes;  efficiency 


processors  which  handle  the  communication  on  behalf  of  the 
spare  processor  has  to  be  taken  into  account). 

To  further  reduce  the  computing  time,  we  have  also  used  het¬ 
erogeneous  systems.  In  fact,  whenever  the  execution  of  differ¬ 
ent  tasks  constituting  the  same  program  is  strictly  sequential, 
heterogeneous  processing  can  help  in  enhancing  performeince 
by  placing  a  task  on  the  most  suitable  machine  for  that  task. 
To  this  goal,  tests  have  been  performed  by  placing  the  master 
process  on  a  vector  computer  for  a  more  efficient  calculation, 
and  the  slave  processes  on  a  parallel  homogeneous  system  with 
scalar  processors. 

However,  in  our  case  the  time  spent  by  the  mcister  is  a  negligi¬ 
ble  part  of  the  total  comuputing  time;  so,  the  test  performed 
by  using  an  heterogeneous  environment  have  shown  no  appre¬ 
ciable  improvements. 


5  3-DIMENSIONAL  EXTENSION 

In  this  section,  we  present  a  method  extension  which  allows 
for  the  simulation  of  three-dimensional  flows  with  one  peri- 
odiccil  direction.  For  this  class  of  flows  it  is  possible  to  take 
advantage  of  the  classical  Fourier  decomposition  of  the  flow 
variables  in  the  periodical  direction.  This  choice  leads  to  re¬ 
duce  all  the  three-dimensional  scalar  differential  problems  in 
the  physical  space  (momentum  equations  and  pressure  cor¬ 
rection  equation)  into  a  sequence  of  two-dimensional  scalar 
differentieJ  problems  in  terms  of  the  transformed  Vciriables. 
Once  the  two-dimensional  problems  are  set  up,  it  is  possible 


to  take  advantage  of  the  given  multi-domain  solution  method 
to  solve  them  efficiently. 

In  particular,  let 

rv/2-i 

xii(x,y,z)  =  ^  i  =  1,2,3,  (45) 

k=-Nf2 

N/2-1 

^  (a:,  y)e^^^  (46) 

Jt=-N/2 

JV/2-1 

Ui(x,y,z)=  ^2  2  =  1,2, 3,  (47) 

k=-N/2 

and 

N/2-1 

Sit,{x,y,z)~  ^  (5ui,*:(a;,y)e^*^  i:=l,2,3,  (48) 

k=-N/2 

with  I  =  -v/— 1.  Applying  the  same  methodology  as  for  the 
2-dimensional  case,  the  3-D  dimensional  algorithm  can  be  re¬ 
formulated  as: 

For  every  n  =  0, 1, ..  (n  being  the  time  counter) 

1  For  i  =  1,2,3,  solve  for  u,,*  (the  predicted  velocity  field) 
the  momentum  equations,  for  k  =  —N/2,  ...,N/2  —  1: 
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5.1  A  3-D  test  case 

To  validate  the  proposed  3-D  algorithm  we  have  considered 
a  direct  numerical  simulation  (DNS)  of  a  low  Reynolds  num¬ 
ber  fuUy  turbulent  channel  flow.  This  flow  ([10])  might  be 
considered  periodic  both  in  stream  and  spanwise  direction 
if  the  dimensions  of  the  computational  box  are  made  large 
enough.  In  the  present  case  we  took  as  Fourier  direction  the 
streamwise  one,  while,  to  impose  periodicity  spanwise  we  im¬ 
posed  the  edges  of  the  subdomains  to  be  neighbours  one  with 
the  other.  All  the  lenghts  have  been  made  non-dimensional 
with  the  channel  half-height,  and  the  velocity  has  been  non- 
dimensionalized  with  the  center-line  velocity.  With  this  se¬ 
lection  the  Reynolds  number  Re  =  Uc  h/fc'  has  been  fixed  to 
the  vcilue  of  6000  and  the  computational  box  had  dimensions 
2,  2,  .8  in  streamwise,  normal  to  the  wall  and  spanwise  direc¬ 
tions  respectively  .  The  grid  configuration  in  a  section  normal 
to  the  mean  flow  is  displayed  in  figure  (7). 


Nproc 

Fig.  6:  3  domains  with  11  x  11  nodes;  efficiency 


2  Solve  for  pj]"'''  the  pressure  correction,  for  k  = 
-iV/2,...,W/2-  1; 


At  dxi  At 


dxf 


(50) 

3  For  i  =  1,2,3,  update  the  velocity  field,  for  k  = 

-N/2,...,N/2-  1; 


At 


s(pr-p:) 

dXf 

/fc.p"+' 


if  i=l,2 
otherwise 


(51) 


The  subscript  I  has  been  introduced  to  stress  the  fact  that  the 
collocated  derivatives  are  computed  in  the  two  non-periodical 
directions  only.  The  term  rhsi^k  represents  the  mode 
of  the  transform  of  the  right-hand-side  of  the  momen¬ 
tum  equation.  The  treatment  of  the  boundary  conditions  is 
straightforward  and  does  not  introduce  any  supplementary 
difficulty.  Despite  its  apparent  complexity,  this  algorithm 
presents  the  advantage  that  all  the  computations  of  the  ellip¬ 
tic  terms  take  place  in  the  transformed  space  (for  the  periodic 
direction)  leading  to  the  full  exploitations  of  the  2-dimensional 
algorithm. 


Fig.  7:  Grid  configuration  in  the  normal  plane 

Five  subdomains,  the  first  and  the  latter  selected  to  embedd 
the  wall  sublayer,  are  used.  Each  subdomain  contciins  20  x  20 
nodes,  while  in  the  Forier  direction  24  modes  are  employed. 
The  present  case  has  been  run  on  a  IBM  RS6000  360H  work¬ 
station  with  about  lOOM flops  peak  performance.  The  cpu 
required  for  each  full  time  iteration  is  of  about  4.5  seconds 
when  the  Galerkin  capacitance  matrix  is  computed  and  stored 
in  a  pre-processing  stage. 


Fig.  8:  Mean  stream-wise  velocity  near  the  waii 


After  having  reached  a  statistical  steady  state  we  measured 
some  typical  turbulent  value  to  assess  the  quality  of  the  ob¬ 
tained  results.  In  figure  (8)  we  compare  the  obtained  velocity 
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profile  with  the  logarithmic  wall  law  (u/utau  2.5  log{y'^)  -h 
5).  In  figure  (9)  the  computed  turbulence  intensities  are  com¬ 
pared  both  with  the  experimental  ones  of  Wei  and  Willmarth 
([11])  at  Reynolds  3850  and  with  the  ones  predicted  by  the 
DNS  of  Jimfeez  and  Moin  ([10])  at  Reynolds  5000. 


•  (vV*)  pres0nl 
-  -  (u'u’)  present 

*  Wei  and  Willmarth 
o  Jimenez  and  Moin 
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Fig.  9:  Near  the  wall  turbulence  intensities  in  Wcill  coor¬ 

dinates 


Finally,  an  instantcineous  velocity  field  in  a  plane  normal  to 
the  mean  stream  is  displayed  in  figure  (10).  The  agreement 
of  the  data  is  acceptable  and  paves  the  way  for  simulation 
of  more  complex  turbulent  flow  configurations.  Indeed  when 
more  points  are  necessary  the  actual  code  might  allow  for  a 
full  exploitation  of  MIMD  computers  architectures. 


Fig.  10:  Instantaneous  normal  plane  velocity  field 


6  CONCLUSION 

The  present  work  has  been  concerned  with  the  solution  of 
the  unsteady  incompressible  Navier-Stokes  equations,  using  a 
high  order  collocated  spectral  multi-domain  method.  The  ra¬ 
tionale  behind  the  choice  and  development  of  the  method  is 
given  both  by  the  possibility  of  coupling  the  potenticil  high 
accuracy  of  spectral  methods  with  the  flexible  framework  of¬ 
fered  by  multi-domains  methods,  and  with  the  natural  way  in 
which  a  parallel  implementation  of  the  present  algorithm  can 
be  achieved. 

In  particular,  we  have  shown  how  the  developed  algorithm  cJ- 
lows  for  the  solution  of  completely  independent  cind  bcJanced 
sub-problems  leading  to  full  exploitation  of  MIMD  parallel 
computers. 


Moreover  the  data  from  the  channel  DNS  simulation  seems 
to  confirm  the  viability  of  the  present  algorithm  to  deal  with 
complex  turbulent  flow  configurations.  At  the  same  time  it 
should  be  stressed  that  the  capability  of  selecting  the  accuracy 
in  determined  flow  regions  might  reveal  to  be  a  powerful  tool 
for  resolved  Lcirge  Eddy  Simulations  in  complex  configurations 
(i.e.,  when  approximate  wall  conditions  are  not  available). 
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ABSTRACT 

A  parallel  implementation  of  the  three-dimensional 
Navier-Stokes  Rotorcraft  flow  solver  TURNS  is  stud¬ 
ied.  We  investigate  two  modifications  of  the  LU-SGS 
operator  to  improve  parallel  performance.  The  first 
is  the  Data-Parallel  LU  Relaxation  (DP-LUR)  tech¬ 
nique.  This  operator  uses  a  Jacobi  sweeping  pro¬ 
cedure  in  place  of  the  Gauss-Seidel  sweeps  in  LU- 
SGS.  The  resulting  algorithm  is  very  amenable  to 
parallel  processing  but  requires  significantly  more 
computational  work.  The  second  approach  is  a  Hy¬ 
brid  technique  which  maintains  the  nearest  neigh¬ 
bor  communication  patterns  of  DP-LUR  but  uses 
the  more  efficient  Gauss-Seidel  sweeps  of  LU-SGS 
for  the  on-processor  computations.  The  TURNS 
code,  with  the  DP-LUR  and  Hybrid  operators,  is 
implemented  on  the  massively  parallel  Thinking  Ma¬ 
chines  CM-5  using  a  MIMD  (i.e.  requiring  mes¬ 
sage  passing)  approach.  The  convergence  qualities 
and  the  CPU  time  of  the  two  implicit  operators 
are  studied  for  an  example  calculation,  computing 
the  quasi-steady  three-dimensional  flowfield  around 
a  helicopter  blade  with  subsonic  and  transonic  tip 
Mach  numbers.  Both  the  DP-LUR  and  Hybrid  mod¬ 
ifications  of  LU-SGS  show  very  good  parallelism, 
and  maintain  the  convergence  rate  of  LU-SGS.  How¬ 
ever,  the  Hybrid  method  uses  less  overall  CPU  time 
than  DP-LUR. 

1.  INTRODUCTION 

In  recent  years  helicopters  have  proven  to  be  eco¬ 
nomical  and  convenient  vehicles  with  their  ability  to 
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land,  take-off  and  maneuver  in  areas  inaccessible  to 
fixed-wing  aircraft.  The  ability  to  predict  the  flow 
around  helicopter  rotors  is  vital  for  the  control  of 
high-speed  losses,  vibration  and  noise. 

Transonic  flow  is  normally  encountered  on  rotors  in 
high-speed  forward  flight.  Various  transonic  flow 
models  have  been  used  for  the  modeling  of  the  tran¬ 
sonic  aerodynamics  around  the  rotor.  The  transonic 
small  disturbance  potential  formulation  is  the  sim¬ 
plest  approximation  used.  A  more  accurate  formula¬ 
tion  is  the  full  potential  formulation.  Two  examples 
of  these  full  potential  formulations  are  the  FPR  [1] 
(Full  Potential  Rotor)  code,  and  the  RFS2  [2]  code. 
The  main  advantage  of  the  full  potential  rotor  codes 
is  that  they  can  provide  a  good  solution  at  a  low 
cost  (CPU  time).  These  codes,  however,  require  an 
approximate  wake  model  to  calculate  the  induced 
downwash.  The  wake  models  are  based  on  simple 
linear  aerodynamics  and,  consequently,  have  a  nar¬ 
row  range  of  applicability. 

A  more  accurate  CFD  method  is  the  Transonic 
Unsteady  Rotor  Navier  Stokes  (TURNS)  code,  re¬ 
cently  developed  at  NASA  Ames  by  Srinivasan  and 
co-workers  [3-5].  TURNS  is  capable  of  computing 
the  tip  vortices  and  the  entire  vortical  wake  as  a 
part  of  the  overall  flowfield  solution.  The  code  has 
been  demonstrated  to  calculate  accurately  the  three- 
dimensional  flow  around  the  tip  of  a  helicopter  rotor 
in  both  hover  and  forward  flight  at  subsonic  and 
transonic  flow  conditions  [3-11]. 

Recently,  TURNS  has  been  applied  in  a  multidisci¬ 
plinary  setting,  computing  a  near-field  CFD  solution 
that  is  then  used  as  input  for  a  Kirchhoff  method 
that  predicts  the  far  field  noise  [12].  The  code  is 
currently  used  by  NASA,  the  Army,  various  Uni¬ 
versities,  and  the  major  US  helicopter  companies. 
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However,  one  drawback  of  TURNS  is  the  amount 
of  computation  time  it  requires.  An  acceptable  cal¬ 
culation  with  TURNS  requires  a  supercomputer  of 
Cray-class.  A  typical  quasi-steady  coarse-grid  Eu¬ 
ler  computation  by  TURNS  requires  about  30  min¬ 
utes  CPU  time  on  a  Cray  C-90,  while  an  unsteady 
computation  requires  3-4  hours.  Fine-grid  viscous 
computations  require  considerably  more  time. 

Parallel  computers,  which  include  massively  parallel 
supercomputers  as  well  as  workstation  clusters,  are 
beginning  to  replace  traditional  vector  supercomput¬ 
ers  for  large  scale  computations  due  to  their  lower 
cost  and  high  peak  execution  rates.  At  present, 
TURNS  is  inefficient  on  parallel  machines.  The  main 
bottleneck  preventing  better  parallel  efficiency  is  the 
LU-SGS  algorithm  [16]  used  for  the  implicit  time 
step.  The  objective  of  our  work  is  to  study  tech¬ 
niques  that  will  improve  its  efficiency.  Thus,  the 
majority  of  this  paper  will  focus  on  the  LU-SGS  al¬ 
gorithm  and  some  modifications  thereof  which  im¬ 
prove  its  parallel  efficiency.  Initial  results  of  this 
effort  were  presented  in  reference  [13]. 

Although  the  TURNS  code  is  primarily  used  for  ro¬ 
tor  CFD  calculations,  the  solution  algorithm  is  the 
same  as  many  other  CFD  methods.  Consequently, 
the  parallelization  procedures  proposed  here  could 
readily  be  used  for  other  codes  that  use  the  LU-SGS 
implicit  operator. 


2.  CODE  DESCRIPTION 

The  governing  equations  for  the  TURNS  code  are 
the  unsteady,  compressible,  three-dimensional  thin 
layer  Navier-Stokes  equations.  These  equations  are 
applied  in  conservation  form  in  a  generalized  body- 
conforming  curvilinear  coordinate  system 

drQ  +  diE  +  d,F  +  d(G:=Ye^cS  +  ^  (1) 


where  r  =  t,  ^  =  ^(x,y,  z,t),  i]  —  r][x,y,  z,t),  and 
-  (^{x,y,z,t).  The  coordinate  system  {x,y,z,t) 
is  attached  to  the  blade.  The  vector  of  conserved 
quantities  is  Q,  and  the  inviscid  flux  vectors  E,  F, 
and  G  are 


p 

■  pU 

pu 

1 

pilU  +  ^xP 

pv 

E  =  j 

pvU  +  (yP 

pw 

J 

pwU  4-  ^zP 

e 

uh-Gp  _ 

■  pV  ■ 

pW 

puV  4-  TfxP 

1 

puW  +  CxP 

pvV  4-  PyP 

°=i 

pvW  +  CyP 

pwV  +  r]zP 

pwW  +  CzP 

VH-ptp  _ 

WH~CtP  . 

where  H  =  {e  +  p)  and  U,  V,  and  W,  are  the  con- 
travariant  velocity  components  (e.g.  U  =  6  +  ^(rU  + 
^yV  +  Cz'w)-  The  cartesian  velocity  components  u, 
V,  and  w  are  defined  in  the  x,  y,  and  z  directions, 


respectively.  The  quantities  and  are  the 

coordinate  transformation  metrics  and  J  is  the  J  aco- 
bian  of  the  transformation.  The  pressure  p  is  related 
to  the  conserved  quantities  through  the  perfect  gas 
equation  of  state 

p  =  (7  -  1)  |e  -  4- -1- (3) 


The  viscous  flux  vector  S  is  incorporated  in  the  code 
but  the  calculations  given  in  this  paper  are  all  invis¬ 
cid  (i.e.  e  =  0  in  Eq.  1)  so  the  viscous  terms  are  not 
described  here.  Details  can  be  found  in  [4], 

The  governing  equations  are  applied  to  an  inertial 
reference  system  that  moves  with  the  blade.  Because 
the  blade  is  rotating,  the  system  is  continuously  un¬ 
steady.  In  order  to  get  a  quasi-steady  starting  so¬ 
lution,  the  blade  must  be  held  in  in  fixed  position. 
This  is  done,  in  effect,  by  adding  source  terms  to  the 
right  hand  side 


0 

Qpv 

-Qpu 

0 

0 


(4) 


where  Q  is  the  angular  velocity  of  the  rotor.  The 
3?  vector  is  used  only  for  the  quasi-steady  case  to 
get  a  starting  solution.  It  is  not  used  for  unsteady 
calculations. 

The  inviscid  fluxes  are  evaluated  using  Roe’s  upwind 
differencing  [14]  in  all  three  directions.  The  use  of 
upwinding  obviates  the  need  for  user-specified  arti¬ 
ficial  dissipation  and  improves  the  shock  capturing 
in  transonic  flowfields.  Third  order  accuracy  is  ob¬ 
tained  using  van  Leer’s  MUSCL  approach  [15]  and 
flux  limiters  are  applied  so  the  scheme  is  Total  Vari¬ 
ation  Diminishing  (TVD). 

The  final  Euler  discretized  form  of  Eq.  1  in  unfac¬ 
tored  implicit  delta  form  is 


[I  +  h  A"  +  +  6cC")]  A<5"  =  -hRHS 


(5) 


where 


RHS^  =  d(E’^ +dr,F^ +  d(G^ (6) 

I  is  the  identity  matrix,  h  is  the  time  step  to  which 
the  formulation  is  described  more  completely  in  [4], 
and  AQ"  =  —  ■  The  5x5  matrices  A,  B  and 

G  are  the  Jacobians  of  the  flux  vectors  with  respect 
to  the  conserved  quantities  (e.g.  A  =  ^). 


3.  IMPLICIT  OPERATOR 

The  TURNS  code  uses  the  two-factor  LU-SGS 
(Lower-Upper  Symmetric  Gauss  Seidel)  algorithm 
of  Yoon  and  Jameson  [16]  for  the  implicit  time  step. 
The  LU-SGS  algorithm  has  been  used  in  a  number 
of  well-known  CFD  codes  (e.g.  1NS3D  [17],  OVER¬ 
FLOW  [18])  primarily  for  it’s  stability  properties 
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with  larger  timesteps.  Classic  implicit  methods  such 
as  Beam- Warming  approximate  factorization  have  a 
large  factorization  error  (of  order  At^)  which  further 
restricts  the  size  of  the  time  step.  The  two-factor 
LU-SGS  method  has  enhanced  stability  along  with 
a  reduction  in  factorization  error  (order  At^)  that 
make  it  an  attractive  alternative.  Unfortunately,  the 
LU-SGS  method  is  difficult  to  parallelize. 


which  can  also  be  written 


■  -RHS^  + 

.  +Bt.^^QU  +  CtySQU)  . 


(14) 


The  LU-SGS  scheme  resembles  a  typical  LU  factor¬ 
ization  scheme  with  diagonal  preconditioning  to  in¬ 
crease  robustness.  The  scalar  diagonal  terms  are 
obtained  by  use  of  approximate  Jacobians,  avoiding 
costly  matrix  inversions.  The  Jacobian  terms  A,B, 
and  C  in  Eq.  5  are  split  into  “-b”  and  parts,  with 
positive  parts  constituting  only  the  positive  eigenval¬ 
ues  and  negative  parts  constituting  only  the  negative 
eigenvalues.  The  positive  matrix  is  backward  differ¬ 
enced  and  the  negative  matrix  is  forward  differenced, 
as  follows 

=  6^A+  +  5+A-  (7) 

This  splitting  ensures  diagonal  dominance.  Approx¬ 
imate  Jacobians  are  constructed  using  a  spectral  ap¬ 
proximation 

A^  =s^^(A±pAl)±epAl  (8) 

where  pA  is  the  spectral  radius  of  A  (in  the  ^  direc¬ 
tion). 

=maa:[|A^|]  =  |U|-ba|V^|  (9) 

e  is  some  small  value  (e.g.  .001),  and  is  defined 
as 

±_r  1  z7±(C/±a|V^t)>0  .  . 

^  ^0  otherwise  ' 

The  same  procedure  is  used  in  the  rj  and  (  directions 
to  form  the  B  and  C  terms. 

Substituting  this  development  into  Eq.  5,  we  arrive 
at  a  system  of  the  form 

LD-^UAQ^  =  -hRHS'^  (11) 

where 

D  —  I  +  h{pA  +  Pb  +  Pc)j^k,i 
L  =  D-h{Al_,ABt,ACl,) 

U  =  7A  +  h  (AJ^.1  +  -b  C,_,_i)  (12) 

D  is  a  diagonal  matrix,  and  the  two  step  LU  decom¬ 
position  can  be  performed  by 

LAQ*  =  -hRHS'^ 

UAQ"  =  DAQ*  (13) 


In  the  first  step  of  (14),  sweeps  updating  SQ*  are 
performed  in  the  positive  direction  (that  is,  from  1 
to  jmax,hmax,lmax)  through  the  solutiou  domain. 
The  second  step  then  computes  5(5"  by  sweeping 
back  through  the  domain  in  the  opposite  direction. 
This  algorithm  can  be  vectorized  using  a  hyperplane 
approach,  as  outlined  in  [19].  Vectorization  is  done 
across  hyperplanes  in  which  j+k+l— const.  This  is 
outlined  in  Fig.  1. 


m 

1 

■ 

Figure  1:  Domain  sweeping  strategy  used  by  LU- 
SGS  algorithm.  Can  vectorize  on  hyperplanes  where 
j+k+l  =  const. 

While  the  hyperplane  approach  leads  to  good  vector 
execution  rates,  it  is  difficult  to  parallelize  for  two 
reasons;  1)  the  size  of  the  hyperplanes  vary  through¬ 
out  the  grid,  leading  to  load  balancing  problems,  and 
2)  there  is  a  recursion  between  the  planes,  leading 
to  a  large  amount  of  communication. 

Parallelization  of  the  LU-SGS  algorithm  in  (14)  has 
been  addressed  by  other  researchers.  Barszcz  et 
al.  [19]  implemented  the  LU-SSOR  algorithm,  which 
is  similar  to  LU-SGS,  on  a  massively  parallel  ma¬ 
chine  by  restructuring  the  data-layout  using  a  skew- 
hyperplane  approach.  Although  they  were  able  to 
extract  reasonable  parallelism  with  this  approach 
the  data-layout  is  complex  and  considerable  effort 
was  required  to  implement  the  domain  partitioning 
in  an  efficient  manner  when  using  a  MIMD  (Multiple 
Instruction  Multiple  Data)  implementation.  Also, 
the  restructuring  of  data  on  the  left  hand  side  in 
turn  causes  the  right  hand  side  layout  to  be  skewed 
and  extra  communication  is  required.  Overall,  the 
LU-SGS  algorithm  (14)  is  not  conducive  to  efficient 
parallel  execution. 

Several  researchers  have  proposed  modifications  of 
the  LU-SGS  algorithm  to  make  it  more  paralleliz- 
able.  Candler  et  al.  [21,  22]  have  investigated  a  mod¬ 
ification  called  Data-Parallel  LU  Relaxation  (DP- 
LUR),  which  has  shown  excellent  results  in  a  data- 
parallel  environment.  It  is  used  in  this  study  and  is 
discussed  more  thoroughly  in  section  3.1.  Wong  et 
al.  [20]  have  investigated  a  domain  decomposition 
implementation  of  LU-SGS.  For  two-dimensional 
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steady  state  reacting  flow  problems,  they  found  that, 
while  the  convergence  rate  of  the  operator  is  re¬ 
duced  with  the  domain  breakup,  the  affect  is  rel¬ 
atively  weak  (e.g.  with  64  subdomains,  the  number 
of  iterations  increrises  by  less  than  20%).  Thus,  the 
domain  decomposition  strategy  appears  promising, 
and  is  used  as  a  basis  for  the  Hybrid  algorithm,  dis¬ 
cussed  in  section  3.2. 

3.1  DP-LUR  Method 

A  modification  of  LU-SGS,  referred  to  as  Data- 
Parallel  LU  Relaxation,  has  been  introduced  by  Can¬ 
dler  et  al.  [21,  22]  for  solving  hypersonic  flow  prob¬ 
lems.  Essentially,  the  modification  involves  trans¬ 
ferring  the  nondiagonal  terms  to  the  right  hand  side 
and  using  values  from  the  previous  iteration  for  these 
terms.  The  modified  operator  then  becomes  Jacobi- 
like  and  requires  only  nearest  neighbor  communica¬ 
tion.  This  operator  hris  been  found  to  be  very  effi¬ 
cient  in  a  data-parallel  environment  (e.g.  [22,  23]). 
The  DP-LUR  modification  of  the  LU-SGS  algorithm 
is  given  in  (15). 


6Ql,  ,  =  D-^  hRHS- 


For  i  —  1, . . . ,  imax  L)o 

6Qfl,  =  D-^- 


(i-l) 

HI 


(15) 


cessing  than  LU-SGS,  the  use  of  Jacobi  sweeps  leads 
to  a  larger  amount  of  computational  work.  It  is 
well-known  that  a  Jacobi  method  will  have  a  theo¬ 
retically  slower  convergence  rate  than  Gauss-Seidel. 
Multiple  sweeps  (e.g.  4-6)  are  therefore  required  in¬ 
side  Eq.  (15)  to  maintain  a  comparable  convergence 
rate  to  LU-SGS.  Although  DP-LUR  can  be  executed 
efficiently  on  a  parallel  machine,  the  added  compu¬ 
tational  cost  is  a  significant  penalty,  the  specifics  of 
which  are  discussed  in  section  5.1.  The  question  is 
whether  the  computational  penalty  of  DP-LUR  is 
the  best  that  we  can  do. 

3.2  Hybrid  Method 

The  motivation  behind  development  of  the  Hybrid 
approach  is  to  replace  a  source  of  inefficiency  in 
DP-LUR.  The  DP-LUR  algorithm  was  developed 
primarily  for  data-parallel  computations.  Its  con¬ 
vergence  is  independent  of  the  number  of  proces¬ 
sors  used  because  the  same  Jacobi  sweeping  strat¬ 
egy  that  allows  nearest  neighbor  communications 
between  the  processors  is  also  used  for  the  compu¬ 
tations  on  each  processor.  Doing  the  on-processor 
computations  with  Jacobi  sweeps  is  a  source  of  in¬ 
efficiency,  since  the  computational  work  can  be  per¬ 
formed  more  efficiently  with  the  Gauss-Seidel  sweeps 
of  LU-SGS.  The  strategy  behind  the  Hybrid  ap¬ 
proach  is  to  use  the  communications  structures  of 
the  DP-LUR  algorithm,  to  maintain  load-balanced 
parallelism  with  nearest  neighbor  communications, 
along  with  the  more  efficient  LU-SGS  algorithm  for 
the  on-processor  computations.  The  algorithm  is 
referred  to  as  the  Hybrid  approach  because  it  re¬ 
tains  features  of  both  the  LU-SGS  and  DP-LUR  al¬ 
gorithms. 


End  Do 
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The  main  difference  between  the  LU-SGS  and  DP- 
LUR  algorithms  is  that  a  Jacobi  sweeping  strategy  is 
used  in  DP-LUR  while  Gauss-Seidel  sweeps  are  used 
in  LU-SGS.  The  advantage  of  using  Jacobi  sweeps  is 
that  there  is  no  recursion  of  data  and  only  nearest 
neighbor  communication  is  required  at  each  node. 
Thus,  it  can  be  completely  load  balanced  with  com¬ 
munications  only  at  the  borders  of  each  partition 
(Fig.  2). 


1 1 

1 1 

1 1 

I  i 

I I 

1 1 

1 1 

1 1 

1 1 

1 1 

1  i 

Figure  2:  Jacobi  Sweeping  Strategy  of  DP-LUR  al¬ 
gorithm.  Load  balanced  parallelism  with  nearest 
neighbor  communication. 


Although  DP-LUR  is  more  amenable  to  parallel  pro- 


D- 


For  i  —  Ij  •  •  •  )  Do 


‘Q’lfj 

‘Q'ifj 


=  iQtkf 

=  D-'-h 


(16) 


-RHS^  -b  (Af_,6Q* 


(i) 


-b  B: 


+c'r+i6Qriv 


End  Do 


The  equations  used  inside  the  sweeps  of  (16)  are  the 
same  as  those  used  by  the  LU-SGS  algorithm  (14). 
Thus,  with  1  sweep  (i.e.  imax  —  1),  the  Hybrid 
algorithm  is  very  similar  to  a  domain  decomposition 
implementation  of  LU-SGS,  the  only  difference  being 
the  initial  condition  on  the  first  line  of  (16).  The  use 
of  multiple  sweeps  improves  the  convergence  rate. 
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making  up  for  the  loss  of  connection  in  the  domain 
decomposition . 

On  1  processor  (with  1  sweep),  the  method  is  iden¬ 
tical  to  the  original  LU-SGS  algorithm.  On  many 
processors,  (i.e.  in  the  limiting  condition  where  the 
number  of  processors  approaches  the  number  of  grid- 
points)  the  Hybrid  method  is  identical  to  the  DP- 
LUR  algorithm.  The  computational  workload  of  the 
Hybrid  algorithm,  therefore,  is  dependent  upon  the 
number  of  processors  used.  The  algorithm  should  be 
most  efficient  with  few  processors,  and  should  always 
require  less  computational  work  than  DP-LUR. 

Parallel  implementation  of  the  Hybrid  algorithm  is 
done  in  essentially  the  same  way  as  DP-LUR.  Border 
data  is  communicated  to  nearest  neighbors  at  the  be¬ 
ginning  of  each  sweep  and  each  processor  performs 
the  standard  LU-SGS  algorithm  on  its  domain.  Be¬ 
cause  the  size  of  the  domains  corresponds  with  the 
number  of  processors  used,  the  convergence  will  be 
different  with  different  processor  partitions.  How¬ 
ever,  like  DP-LUR,  the  Hybrid  algorithm  maintains 
load  balanced  parallelism  with  only  nearest  neighbor 
communications. 

4.  PARALLEL  IMPLEMENTATION 

A  MIMD  approach  (i.e.  requiring  message  passing) 
is  used  for  parallel  implementation.  There  are  two 
reasons  for  choosing  the  MIMD  approach  over  a 
SIMD  (Single  Instruction  Multiple  Data)  or  data- 
parallel  approach;  1)  Code  portability;  because  mes¬ 
sage  passing  codes  are  more  portable  to  different 
parallel  architectures  (e.g.  from  massively  paral¬ 
lel  supercomputers  to  workstation  clusters),  and  2) 
Ease  of  implementation;  since  the  original  code  is 
over  6000  lines,  it  is  much  easier  to  add  message 
passing  directives  to  the  existing  code  than  rewrite 
the  entire  code  in  a  High  Performance  Fortran  type 
language  (e.g.  CMFortran).  To  ensure  easy  porta¬ 
bility  of  the  code,  a  set  of  generic  message  passing 
subroutines  was  used.  With  this  protocol,  the  spe¬ 
cific  message  passing  commands  can  be  altered  in 
one  line  of  the  code  rather  than  throughout,  making 
conversion  to  different  message  passing  languages, 
such  as  PVM  (Parallel  Virtual  Machine)  and  MPI 
(Message  Passing  Interface),  a  relatively  short  pro¬ 
cedure. 

Fig.  3  shows  the  breakup  of  the  three-dimensional 
solution  domain.  The  flowfield  domain  is  layed  out 
on  a  two-dimensional  array  of  processors.  The  flow- 
field  is  split  in  the  wraparound  \  j)  and  spanwise  (A’) 
directions.  The  normal  direction  (A)  is  left  intact  so 
that  the  implementation  of  surface  boundary  condi¬ 
tions  is  unchanged  from  the  existing  serial  code.  A 
single  layer  of  ghost  cells  is  placed  on  the  border  of 
each  processor,  providing  a  location  where  the  com¬ 
municated  data  can  be  stored. 

The  communications  between  neighboring  proces¬ 
sors  is  done  once  during  each  of  the  inner  sweeps 


\ 


Figure  3:  Partitioning  the  three-dimensional  domain 
on  a  two-dimensional  array  of  processors. 


of  the  DP-LUR  and  Hybrid  algorithms,  totaling 
4  X  imax  communication  steps.  One  communica¬ 
tion  step  is  required  to  pass  information  to  form  the 
RHS,  since  third  order  accuracy  requires  data  at  the 
j,k,l±2  points.  This  communication  step  could  be 
eliminated  if  a  layer  of  two  ghost  cells  were  used  but 
this  increases  memory  and  communications.  The 
boundary  conditions  at  the  flowfield  borders  and  on 
the  rotor  blade  can  be  imposed  locally  on  each  pro¬ 
cessor,  but  communication  is  required  in  the  wake 
region  where  L  =  1  to  enforce  the  boundary  con¬ 
dition  where  the  C-H  grid  collapses  and  data  is  av¬ 
eraged  across  this  wake  plane.  Only  the  processors 
holding  this  data  perform  communications  and  only 
one  communication  step  is  needed. 

5.  RESULTS  AND  DISCUSSION 

The  TURNS  code  with  the  DP-LUR  and  Hybrid  im¬ 
plicit  operators  have  been  implemented  on  the  mas¬ 
sively  parallel  Thinking  Machines  CM-5  at  the  Army 
High  Performance  Computing  Research  Center  (AH- 
PCRC)  in  Minneapolis,  MN.  The  CM-5  has  a  total 
of  896  processors,  configurable  in  processor  parti¬ 
tions  of  64,  256,  and  512  processors.  The  implemen¬ 
tation  is  performed  by  adding  message  passing  calls 
to  the  existing  Fortran  77  code.  The  message  pass¬ 
ing  calls  are  taken  from  the  CMMD  library,  which 
is  supported  by  Thinking  Machines,  Inc. 

Each  processor  on  the  CM-5  has  a  peak  performance 
of  5  Mfiops/Processor.  Vector  Units  (VU’s)  ex- 
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ist  on  each  processor  that  increase  the  performance 
substantially  (e.g.  from  5  Mflops/processor  to  128 
Mflops/processor).  Unfortunately,  the  only  way  to 
utilize  the  VU’s  at  this  time  is  to  rewrite  the  code  in 
CMFortran,  a  High-Performance-Fortran  type  lan¬ 
guage.  Since  TURNS  is  over  6K  lines,  rewriting  the 
code  would  require  considerable  effort  and  was  one 
of  the  main  reasons  we  chose  the  MIMD  implemen¬ 
tation  in  the  first  place.  In  addition,  rewriting  the 
code  to  CMFortan  would  eliminate  code  portability. 
Consequently,  the  results  presented  here  are  deter¬ 
mined  without  utilizing  the  VU’s.  Although  this 
degrades  the  performance  on  the  CM-5,  it  is  not  a 
big  drawback  overall,  because  our  future  plans  are 
to  run  the  code  on  parallel  systems  such  as  the  IBM 
SP-2  and  workstation  clusters,  which  do  not  have 
vector  units. 

The  code  is  run  for  a  test  problem  that  computes 
the  quasi-steady  flowfield  around  a  symmetric  OLS 
blade.  The  OLS  blade  has  a  sectional  airfoil  thick¬ 
ness  to  chord  ratio  of  9.71%  and  is  a  1/7  scale  model 
of  the  main  rotor  for  the  Army’s  AH-1  helicopter.  A 
135  X  50  X  35  C-H  type  grid  is  used,  with  the  do¬ 
main  extending  eight  chords  in  all  directions.  The 
upper  half  of  the  grid  is  shown  in  Fig.  4.  We  chose 


Figure  4;  Upper  half  of  the  135  x  50  x  35  C-H  type 
grid  used  for  OLS  airfoil  calculations  on  the  CM-5. 

to  use  this  particular  grid  and  airfoil  because  they 
were  used  for  calculations  in  the  aeroacoustic  study 
in  [12].  Unfortunately,  the  unusual  mesh  dimensions 
cannot  be  partitioned  in  a  way  that  exactly  matches 
the  processor  partitions  on  the  CM-5  (64,  256,  and 
512  processors).  For  example,  the  J  dimension  has 
only  odd  factors  so  it  is  impossible  to  partition  it  on 
an  even  number  of  processors.  We  did  break  up  the 
mesh  in  a  way  that  used  most  of  the  processors  in 
the  partition.  For  the  64  node  partition,  the  mesh 
was  broken  in  19  points  in  the  J  direction,  and  3 
points  in  the  K  direction,  giving  a  total  of  57  pro¬ 
cessors.  For  the  256  node  partition,  the  mesh  was 
broken  in  19  points  and  12  points  in  the  J  and  K  di¬ 
rections,  respectively,  giving  228  processors.  Finally, 
for  the  512  node  partition,  the  mesh  was  broken  in 
19  points  and  24  points  in  the  J  and  K  directions, 
giving  456  processors.  When  executing  the  code,  the 
remaining  processors  in  the  partition  sit  idle.  Gen¬ 


erally,  most  newer  machines  (e.g.  IBM  SP-2)  allow 
the  user  to  choose  the  exact  number  of  processors 
they  want  for  their  partition,  so  this  will  most  likely 
not  be  an  issue  on  more  modern  machines. 

The  three  dimensional  quasi-steady  starting  solution 
is  computed  around  the  rotating  blade  in  subsonic 
conditions,  with  Mup  =  0.664,  and  a  more  tran¬ 
sonic  condition,  with  Mup  =  0.80.  In  both  cases, 
the  freestream  Mach  number  is  Moo  =  0.17  and  the 
blade  position  is  fixed  at  zero  degrees  azimuth  an¬ 
gle  (Fig.  5).  It  should  be  noted  that  the  first  case, 
Mtip  =  0.664,  is  a  realistic  test  case  for  rotor  cal¬ 
culations.  The  Mtip  =  0.800  case,  however,  is  far 
too  transonic  to  be  used  in  a  practical  helicopter  ap¬ 
plication.  It  was  added  as  an  extreme  test  case  to 
investigate  the  behavior  of  the  implicit  solvers  with 
more  nonlinear  transonic  flows. 

“m  u  u  u 


Figure  5:  Quasi-Steady  solution.  Blade  fixed  at  zero 
degrees  azimuth  angle. 

It  should  be  also  be  noted  that  results  are  presented 
for  a  quasi-steady  fixed  blade  case  instead  of  an  un¬ 
steady  case  because  the  convergence  behavior  of  the 
implicit  solvers  can  be  quantified  most  easily  with 
this  quasi-steady  case.  It  is  difficult  to  investigate 
convergence  behavior  with  an  unsteady  case  without 
also  verifying  time-accuracy  for  the  implicit  solver. 
This  does  not  indicate  that  the  method  is  unable 
perform  unsteady  runs.  The  same  algorithm  is  used 
for  time-accurate  unsteady  cases  so  these  cases  can 
be  run  without  further  modifications  to  the  algo¬ 
rithm. 

5.1  DP-LUR  Results 

The  results  of  timings  of  TURNS  with  the  DP- 
LUR  algorithm  on  57,  228,  and  456  processors  are 
given  in  Tables  1  and  2,  for  the  Mup  =  0.664  and 
Mtip  =  0.800  Ccises,  respectively.  The  method  is 
stopped  when  the  density  residual  drops  by  two  or¬ 
ders  of  magnitude  below  its  maximum  value.  Plots 
of  the  L2-norm  density  residual  vs.  number  of  iter¬ 
ations  are  shown  for  the  two  cases  in  Figs.  6  and 
7.  The  convergence  of  the  original  LU-SGS  method, 
run  on  a  single  processor,  is  also  shown  in  the  plots 
for  comparison  purposes. 


Quasi-Steady 
Starting  Soiution 


Biade  fixed 
at  0  deg  Azimuth 
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Table  1  -  Timing  Results  on  the  CM-5  for  TURNS 
with  DP-LUR  for  subsonic  test  case.  135  x  50  x  35 
mesh,  Mtip  =  0.664,  density  residual  converged  to 
5  X  lO-"^. 


1  Procs 

Iterations 

%  Comm. 

Tot.  Time 

436 

10.4  % 

9330  sec 

440 

15.3  % 

2508  sec 

438 

21.0  % 

1445  sec 

6  sweeps 
57 

351 

9.2  % 

8505  sec 

228 

350 

15.1  % 

2233  sec 

456 

353 

19.9  % 

1292  sec 

7  sweeps 
57 

304 

9.6  % 

8229  sec 

228 

304 

16.6  % 

2110  sec 

456 

306 

20.6  % 

1224  sec 

titftliont 


Figure  6:  Convergence  of  TURNS  with  DP-LUR 
method.  Mup  =  0.664 


Table  2  -  Timing  Results  on  the  CM-5  for  TURNS 
with  DP-LUR  for  transonic  test  case.  135  x  50  x  35 
mesh,  Mtip  =  0.800,  density  residual  converged  to 
5  X  10-^ 


Procs 

Iterations 

%  Comm. 

Tot.  Time 

5  sweeps 

■MM 

57 

464 

9902  sec 

228 

457 

2628  sec 

456 

465 

1511  sec 

6  sweeps 

■MM 

57 

379 

9210  sec 

228 

380 

■^9 

2424  sec 

456 

383 

■IM 

1402  sec 

7  sweeps 

57 

335 

9.6  % 

9068  sec 

228 

335 

16.6  % 

2383  sec 

456 

345 

20.6  % 

1380  sec 

Figure  7:  Convergence  of  TURNS  with  DP-LUR 
method.  Mup  =  0.80 


Figure  8:  Parallel  Speedups  of  the  time  per  iteration  using  the  DP-LUR  operator 
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Table  3  -  Timing  Results  on  the  CM-5  for  TURNS 
with  the  Hybrid  method  for  subsonic  test  case. 
135  X  50  X  35  mesh,  Mup  =  0.664,  density  residual 
converged  to  5  x  10“^. 


Iterations 

%  Comm. 

Tot.  Time 

1  sweep 
57 

461 

10.3  % 

4937  sec 

228 

470 

15.1  % 

1434  sec 

456 

502 

18.8  % 

863  sec 

2  sweeps 
57 

394 

10.1  % 

5410  sec 

228 

398 

14.8  % 

1524  sec 

456 

404 

20.6  % 

889  sec 

3  sweeps 
57 

386 

10.0  % 

6423  sec 

228 

385 

14.8  % 

1771  sec 

456 

385 

19.7  % 

1012  sec 

Table  4  -  Timing  Results  on  the  CM-5  for  TURNS 
with  the  Hybrid  method  for  transonic  test  case. 
135  X  50  X  35  mesh,  Mup  =  0.800,  density  residual 
converged  to  5  x  10“^. 


Procs 

Iterations 

%  Comm. 

Tot.  Time 

1  sweep 
57 

531 

5719  sec 

228 

558 

mSSm 

1707  sec 

456 

580 

■m 

998  sec 

2  sweeps 
57 

483 

6568  sec 

228 

485 

1858  sec 

456 

492 

1082  sec 

3  sweeps 
57 

467 

9.9  % 

7748  sec 

228 

466 

14.4  % 

2143  sec 

456 

470 

19.2  % 

1226  sec 

Hwaliom 

Figure  9:  Convergence  of  TURNS  with  Hybrid 
method.  Mup  =  0.664 


Figure  10:  Convergence  of  TURNS  with  Hybrid 
method.  Mup  =  0.80 


Figure  11:  Parallel  Speedups  of  the  time  per  iteration  of  TURNS  using  the  Hybrid  operator 
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The  convergence  plots  show  that  a  minimum  of  5 
inner  sweeps  (i.e.  imax  =  5)  of  DP-LUR  are  re¬ 
quired  to  converge  the  solution.  In  both  the  subsonic 
and  transonic  cases,  4  sweeps  began  to  diverge.  For 
the  Mtip  =  0.664  case,  5  sweeps  gives  slightly  worse 
convergence  than  single  processor  LU-SGS  while  6 
sweeps  gives  slightly  better.  For  the  Mup  =  0.800 
Ccise,  5  sweeps  of  DP-LUR  gives  about  the  same  con¬ 
vergence  as  single  processor  LU-SGS,  and  6  sweeps  is 
better.  This  seems  to  indicate  that  DP-LUR  main¬ 
tains  a  good  level  of  robustness  for  transonic  cases, 
since  it  requires  less  inner  sweeps  to  maintain  the 
convergence  rate  of  LU-SGS.  The  single  processor 
LU-SGS  method  requires  the  work  of  approximately 
1.8  sweeps  of  DP-LUR.  Consequently,  these  results 
show  that,  in  order  to  maintain  the  same  conver¬ 
gence  rate,  the  DP-LUR  implicit  operator  requires 
about  3  times  the  computational  work  of  single  pro¬ 
cessor  LU-SGS. 

Timings  of  the  DP-LUR  method  indicate  that  more 
sweeps  seems  to  be  the  better  choice.  The  overall 
CPU  time  with  7  sweeps  is  fastest,  but  the  difference 
between  6  and  7  sweeps  is  small  (less  than  2%).  Each 
additional  sweep  increases  the  GPU  time  per  itera¬ 
tion  by  10-15%.  Communication  represents  a  rela¬ 
tively  small  percentage  of  the  total  CPU  time.  The 
communication  percentage  increases  with  increasing 
number  of  processors.  Also,  the  percentages  tend  to 
fluctuate  for  different  cases  which  is  probably  due 
to  the  fact  that  these  runs  were  done  on  a  loaded 
rather  than  dedicated  machine. 

It  should  be  noted  that,  in  theory,  the  solution  us¬ 
ing  DP-LUR  is  the  same  regardless  of  the  number  of 
processors  used,  so  the  number  of  iterations  should 
be  the  same  for  all  processor  partitions.  However, 
Tables  1  and  2  show  that  the  implementation  did 
show  some  slight  discrepancies  in  the  number  of  iter¬ 
ations.  Generally,  the  differences  are  small  (less  than 
4%)  and  we  attribute  them  to  numerical  roundoff  in 
the  machine.  Differences  in  the  overall  solution  are 
indistinguishable  for  the  different  partition  sizes. 

A  plot  of  the  parallel  speedups  of  the  time  per  iter¬ 
ation  of  TURNS  with  DP-LUR  is  shown  in  Fig.  8. 
The  speedup  from  57  to  228  processors  is  nearly  lin¬ 
ear,  but  some  falloff  is  noted  for  456  processors.  This 
is  believed  to  be  due  to  the  relatively  small  problem 
size  of  236,250  gridpoints.  It  is  expected  that  the 
speedup  will  be  more  linear  with  larger  problems. 
The  parallel  speedup  increases  slightly  for  a  larger 
number  of  sweeps,  since  the  amount  of  computa¬ 
tional  work  goes  up.  However,  the  difference  is  not 
significant. 

5.2  Hybrid  Results 

Results  of  timings  with  the  Hybrid  algorithm  are 
given  in  Tables  3  and  4,  for  the  Mup  =  0.664  and 
Mtip  =  0.800  cases,  respectively.  Plots  of  the  density 
residual  vs.  CPU  time  are  given  in  Figs.  9  and  10. 

The  efficiency  of  the  Hybrid  method  is  apparent  in 


the  number  of  inner  sweeps  required  for  convergence. 
While  DP-LUR  required  a  minimumof  5  sweeps,  the 
Hybrid  method  converges  at  a  comparable  rate  to 
single  processor  LU-SGS  with  only  1  sweep.  This 
is  due  to  the  more  efficient  Gauss-Seidel  procedure 
used  for  the  on-processor  computations.  With  2 
sweeps,  the  convergence  of  the  Hybrid  method  is 
almost  identical  to  single  processor  LU-SGS.  With 
one  sweep,  there  is  significant  spread  between  the 
convergence  curves  for  the  different  numbers  of  pro¬ 
cessors,  but  with  2  sweeps,  the  spread  is  reduced 
considerably  so  that  all  processor  partitions  follow 
essentially  the  same  convergence  path  as  LU-SGS. 
Although  it  is  not  shown  in  the  figures,  the  conver¬ 
gence  plot  with  3  sweeps  is  only  slightly  better  than 
with  2,  and  it  is  therefore  not  plotted  to  avoid  the 
graph  from  becoming  too  crowded. 

The  Hybrid  method  is  considerably  faster  than  DP- 
LUR.  The  CPU  times  of  the  Hybrid  method  are  only 
55-60%  those  of  DP-LUR.  This  is  due  to  the  larger 
amount  of  computational  work  in  DP-LUR,  because 
a  larger  number  of  sweeps  are  required  for  conver¬ 
gence. 

It  should  be  pointed  out  that  each  sweep  with  DP- 
LUR  involves  only  a  single  sweep  through  the  do¬ 
main  on  each  processor,  whereas  the  Hybrid  method 
performs  the  two-step  LU-SGS  algorithm  on  each 
processor,  performing  two  sweeps  through  the  do¬ 
main.  Thus,  each  sweep  of  the  Hybrid  method  is 
approximately  equivalent  to  the  work  of  two  sweeps 
in  DP-LUR.  This  is  indicated  in  the  CPU  times;  the 
CPU  time  using  6  sweeps  of  DP-LUR  is  approxi¬ 
mately  equal  to  3  sweeps  using  the  Hybrid  method. 

Using  1  sweep  in  the  Hybrid  method  gives  the  best 
CPU  time,  but  requires  17-18%  more  iterations  than 
single  processor  LU-SGS.  The  CPU  time  with  2 
sweeps  is  worse  than  that  of  1  sweep  by  about  8%, 
but  the  convergence  rate  is  much  closer  to  that  of 
single  processor  LU-SGS.  When  3  sweeps  are  used, 
the  convergence  is  only  slightly  better  (a  reduction 
in  iterations  of  less  than  5%)  than  2  sweeps,  while 
the  CPU  time  is  about  11-15%  more.  Thus,  3  sweeps 
or  more  appears  to  be  unnecessary. 

A  plot  of  the  parallel  speedups  of  the  time  per  iter¬ 
ation  is  shown  in  Fig.  11.  The  parallel  speedups  are 
essentially  the  same  as  with  DP-LUR. 

6.  SUMMARY  AND  CONCLUSIONS 

A  strategy  is  presented  for  implementing  the  three- 
dimensional  Navier-Stokes  Rotorcraft  CFD  code 
TURNS  on  massively  parallel  computer  architec¬ 
tures.  The  main  portion  of  the  code  that  is  difficult 
to  parallelize  is  the  implicit  timestep  using  the  LU- 
SGS  operator.  We  study  two  modifications  of  this 
operator  that  make  it  more  amenable  to  parallel  im¬ 
plementation.  The  first  is  the  Data-Parallel  LU  Re¬ 
laxation  (DP-LUR)  technique,  which  essentially  re¬ 
places  the  Gauss-Seidel  sweeps  in  LU-SGS  with  Ja- 
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cobi  sweeps,  and  uses  multiple  sweeps  of  the  domain 
to  maintain  the  same  convergence  rate.  The  sec¬ 
ond  is  a  new  approach  that  couples  the  Jacobi  com¬ 
munication  strategy  of  DP-LUR  with  Gauss-Seidel 
sweeps  of  LU-SGS  for  the  on-processor  computa¬ 
tions.  It  also  uses  multiple  inner  sweeps  to  maintain 
the  convergence  rate  of  LU-SGS.  Because  this  sec¬ 
ond  approach  retains  features  of  both  the  DP-LUR 
and  LU-SGS  algorithms,  we  call  it  a  Hybrid  method. 

The  TURNS  code  is  tested  on  the  Thinking  Ma¬ 
chines  CM-5,  using  a  MIMD  approach  for  parallel 
implementation.  It  is  run  for  an  Euler  quasi-steady 
calculation  with  236,250  gridpoints,  computing  the 
flow  around  the  tip  of  a  helicopter  blade  rotating 
with  subsonic  and  transonic  tip  Mach  numbers.  Re¬ 
sults  from  various  processor  partitions  show  that 
both  the  DP-LUR  and  Hybrid  modifications  of  LU- 
SGS  are  very  parallelizable,  showing  good  parallel 
speedups.  Both  methods  are  also  able  to  maintain 
the  convergence  qualities  of  original  LU-SGS  for  all 
test  cases.  The  Hybrid  method,  however,  requires 
less  CPU  time  due  to  lower  computational  work  re¬ 
quirements.  The  DP-LUR  modification  of  LU-SGS 
causes  the  amount  of  computational  work  in  the  im¬ 
plicit  solver  to  increase  threefold,  to  maintain  the 
same  convergence  rate.  The  Hybrid  modification, 
however,  can  match  to  within  25%  the  convergence 
rate  of  single  processor  LU-SGS  with  no  increase  in 
the  computational  work.  It  can  exactly  match  the 
convergence  rate  with  twice  as  much  work  in  the  im¬ 
plicit  solver,  yielding  CPU  times  that  are  only  8% 
higher  than  the  single  sweep  cases.  Overall,  the  CPU 
times  for  the  Hybrid  method  are  only  55-60%  those 
of  DP-LUR. 

The  computational  work  required  of  the  Hybrid  ap¬ 
proach  on  a  parallel  machine  will  always  be  less  than 
that  of  DP-LUR.  On  a  few  processors,  the  amount 
of  computational  work  will  be  about  the  same  as 
LU-SGS.  The  Hybrid  approach  is  therefore  ideally 
suited  for  machines  that  have  smaller  numbers  of 
more  powerful,  non-vectorized,  processors.  One  ex¬ 
ample  of  a  machine  that  fits  this  category  is  the  150 
processor  IBM  SP-2.  We  are  currently  implement¬ 
ing  the  code  on  the  IBM  SP-2  at  NASA  Ames,  and 
expect  better  CPU  times  than  what  were  obtained 
on  the  CM-5. 

Finally,  although  the  TURNS  code  is  used  primarily 
for  rotorcraft  CFD  applications,  the  parallelization 
strategy  is  not  unique  to  this  application.  The  paral¬ 
lelization  procedures  proposed  here  could  be  readily 
used  for  other  CFD  codes  that  use  the  LU-SGS  al¬ 
gorithm. 
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Abstract 

The  present  article  reports  on  further  developments 
of  an  implicit  coupled  algorithm  for  fluid  flow  equa¬ 
tions.  Mass  and  momentum  conservation  equations 
are  solved  as  part  of  one  large  system  of  equations 
in  one  single  step.  Iterations  are  needed  because 
of  nonlinearities  only.  The  algorithm  requires  no 
under-relaxation  factors  and  can  reach  convergence 
in  a  reduced  number  of  iterations,  compared  to  de¬ 
coupled  approaches.  This  article  describes  improve¬ 
ments  leading  to  reduction  of  both  memory  and  com¬ 
puting  time.  The  algorithm  exceeds  the  memory  re¬ 
quirements  of  the  SIMPLE  algorithm  of  Patankar 
and  Spalding  by  a  factor  of  where  K  is  the  num¬ 
ber  of  independent  variables.  Computing  time  reduc¬ 
tion  was  achieved  by  using  GMRES  and  a  precon¬ 
ditioner  based  on  incomplete  LU  factorization.  The 
algorithm  compares  favourably  with  conventional  de¬ 
coupled  approaches.  To  overcome  the  high  mem¬ 
ory  requirements  and  enable  the  simulation  of  large 
physical  problems  two  different  approaches  for  par¬ 
allelization  were  also  tested,  at  the  expense  of  in¬ 
creased  computing  time. 

1  INTRODUCTION 

The  SIMPLE  [1,  2]  algorithm  is  amongst  the  most 
widely  used  algorithms  for  solving  the  fluid  flow 
equations.  The  difficulties  of  convergence  of  SIM¬ 
PLE  when  dealing  with  large  problems,  either  in 
terms  of  physical  complexity  or  grid  size,  are  well 
known  and  have  been  discussed  in  the  open  litera¬ 
ture  (e.g.:  [3]  [4],  [5]).  SIMPLE  is  relatively  easy  to 
implement  and  accommodate  for  increased  number 
of  transport  equations,  but  its  sensitivity  to  numer¬ 
ical  aspects  as  for  instance,  under-relaxation  factors 
[6]  has  led  to  many  research  efforts  and  even  new 
algorithms  (e.g.:  [7]),  sometimes  closely  related  to 
SIMPLE  (e.g:  [8]  [9]). 

The  algorithm  discussed  in  this  article  (designated 
DIRECTO  [10])  solves  the  fluid  flow  equations  as  a 
complete  coupled  system.  The  cell  face  velocities  are 
predicted  using  a  momentum  equation,  which  once 
replaced  into  the  continuity  equation  leads  to  a  pres¬ 
sure  equation  fully  coupled  to  the  velocity  field.  No 
simplification  is  made  at  this  stage,  the  equation  is 


exact,  opposed  to  pressure  (or  pressure-correction) 
equation  derived  from  segregated  [11]  algorithms. 

The  code  development  was  made  using  the  two 
classical  geometries  of  a  two-dimensional  cavity  with 
a  sliding  lid  and  a  backward-facing  step.  Results  [12] 
show  that  for  instance,  in  case  of  the  square  cavity 
with  sliding  lid  at  Re=:1000  the  DIRECTO  algorithm 
with  LU  factorization  converged  in  8  iterations,  inde¬ 
pendently  of  grid  size,  for  a  residual  error  of  1  x  10“^. 
On  the  other  hand  the  SIMPLE  algorithm  although 
requiring  186  iterations  for  a  grid  of  64x64,  used  10  x 
less  CPU  time.  This  order  of  magnitude  ratio  was 
reduced  by  using  Block  Band  LU  factorization  [13] 
and  GMRES  [14],  since  at  each  iteration  there  is  the 
solution  of  a  large,  sparse,  unsymmetric,  block-band 
(block  tridiagonal)  linear  system. 

The  need  for  finer  grids,  mainly  on  complex  ge¬ 
ometries,  leads  to  very  large  systems  of  equations  re¬ 
quiring  the  use  of  secondary  storage  and  large  CPU 
times.  Parallel  architectures  with  distributed  mem¬ 
ory  may  be  one  answer  to  those  problems.  The  main 
drawbacks  are  the  communication  between  proces¬ 
sors  and  additional  computations.  In  this  work  a 
cluster  of  4  DEC  AlphaStation  AXP,  models  500S 
and  600S,  connected  by  FDDI  (Fiber  Distributed 
Data  Interface)  and  Gigaswitch  using  PVM  (Parallel 
Virtual  Machine)  [15,  16]  was  used  as  a  parallel  envi¬ 
ronment.  PVM  is  a  software  package  that  allows  the 
concurrent  use  of  heterogeneous  processing  elements. 

The  article  is  made  up  of  3  major  Sections.  In  Sec¬ 
tion  2  the  algorithm  is  described.  Section  3  discusses 
the  results  with  respect  to  accuracy,  memory  require¬ 
ments  and  computing  time,  including  a  discussion  on 
the  linear  solvers  and  parallelization.  Section  4  con¬ 
cludes  the  article. 


2  MATHEMATICAL  MODEL 

The  governing  equations  for  two-dimensional  incom¬ 
pressible  Newtonian  and  isothermal  flows  are,  in  ten- 
sorial  notation, 
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and, 

djUiUj)  _  dP  d  fdUi  dUj\ 

^  dxj  dxi  ^ dxj  \dxj  dxi  J  ’  ^2) 


if  >  0.  The  superscripts  k  —  I  and  k  denote 
previous  and  current  iteration. 

The  second  member  of  equation  (3)  is  discretized 
using  second-order  finite  differences, 


where  Ui  is  the  velocity  component  along  the  Xi  di¬ 
rection  and  P,  p  and  p  are  the  static  pressure,  density 
and  dynamic  viscosity,  respectively. 
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Figure  1:  Control  volume  for  nodal  point  P  (upper 
and  lower  case  denote  nodal  and  face  values,  respec¬ 
tively)  . 

When  the  first-order  derivatives  of  equa¬ 
tions  (1)  and  (2)  are  integrated  in  the  control 
volume  P  (Fig.  1)  of  a  non-staggered  grid,  new 
equations  will  arise  depending  on  the  velocities  at 
the  faces  of  the  control  volume. 


^  _  P|-P^ 

dx  Ax 


(6) 


d^U  ,  UI-2u\  +  U^e 

^  dx‘^  ~  ^  Ax^ 


(7) 


-  2[ 

^UnepzM±Use\ 

Ay^  )  ■ 

Replacing  (4)  -  (8)  into  equation  (3), 

(9) 

nh  nb 


where  are  the  coefficients  for  the  nodal 

values  of  velocity  and  pressure  surrounding  face  east. 


Mass  conservation 

When  equation  (1)  is  integrated  in  the  control  vol¬ 
ume  of  Fig.  1, 


Face  velocities 

The  relationship  between  the  nodal  and  face  values 
is  found  by  discretization  (in  a  control  volume  cen¬ 
tred  at  the  face  of  control  volume  P)  of  a  simplified 
version  of  equation  (2),  obtained  assuming  mass  con¬ 
servation  and  constant  viscosity 


djUiUj)  _  gP  ,  d^Ui 

^  dxj  dxi  ^  dx^ 


(3) 


Two  different  discretizations  of  the  convective 
term  of  equation  (3)  were  tested:  (Dl),  with  a  first- 
order  upwind  scheme  for  the  two  derivatives;  and 
(D2),  with  a  second-order  central  difference  scheme 
for  the  derivative  in  the  direction  perpendicular  to 
the  unknown  velocity  only.  For  instance,  for  velocity 
Ue  we  have  in  case  of  Dl  discretization,  if  >  0 
and  >  0, 


djUiUj) 

dxj 


^2pul 


Aa; 


,,  2ul  -Ul-  UIe 
2Ay 


(4) 


and  in  case  of  D2, 


djUiUj) 

^  dxj 


=  2pu*-i 


U^e-Up 

Ax 


-I- 


pv; 


k-i 


-  Ul 


Ay 


+ 


j  ^dV  =  {ul  -  ui)  Ay  +  (vl  -  vl)  Ax 
Jv  (10) 

The  equations  for  face  velocities  (9)  are  now  replaced 
into  (10)  leading  to  the  following  algebraic  equation, 

E  + E  + E  “ 

nb  nb  nb  (fl) 

The  coefficients  and  represent  links  to  a 
total  of  18  nodal  velocities,  i.e.  9-node  star  for  veloc¬ 
ity  U  and  V,  surrounding  P;  whereas  A^f  includes 
connections  to  5  nodal  pressures  {Pp,  Pe,  Pw,  Pn 
and  P5). 


Momentum  conservation 

The  integration  of  equation  (2)  along  ^  =  1  direction 
is, 


p  {uZ  "uZ  —  u 


+  p 
+  p 
+  fj‘ 


UZ.)  Ay  + 
k-l...k\ 


uZ  —  v'Z  "u'z)  Ax  =  — 
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jjk 

Pe 


Up  Up-Uw 
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NE 
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~  ^NW 


Ay  J 

^SE  +  ^SW 


4Ax 


4 


Ay,  (12) 
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which,  after  replacing  the  equations  for  the  velocities 
at  the  faces,  yields, 


nb 


nb 


nb 


(13) 


The  coefficients  represent  links  to  the  9  nodal  U 
velocities  surrounding  P,  and  A^l^  represent  links  to 
4  nodal  velocities  (Vne,  Vnw,  smd  Psiv)'  The 
A^^  coefficients  includes  the  contributions  from  7 
nodal  pressures  {Pp,  Pei  Pwt  PnEi  Pnw  i  PsB  ^-nd 
Psw)- 

The  momentum  equation  in  the  i  =  2  direction 


nb 


nb 


nb 


(14) 


may  be  obtained  by  an  identical  procedure. 

Equations  (11),  (13)  and  (14)  are  all  assembled  in  a 
single  system  of  equations  and  solved  simultaneously. 
The  system  of  equations  is  of  the  form. 

Ax  =  b  (15) 

X  is  a  vector  with  sequence  of  blocks  with  variables 
U,  V  and  P.  The  order  of  matrix  A  is  {NI  —  2)  x 
(N J—2)xK,  where  NIxNJ  is  the  problem  size,  and 
K  stands  for  the  number  of  variables  (i.e.,  3  in  case 
of  a  two-dimensional  laminar  flow).  This  is  a  sparse, 
unsymmetric,  block-band  (block  tridiagonal)  linear 
system.  Because  this  is  the  most  time  consuming 
part  of  the  algorithm,  special  attention  was  given  to 
this  subproblem  (in  Section  3.3.1). 

After  solution  of  the  linear  system  (15)  one  global 
iteration  is  completed.  Because  of  the  non-linearity 
of  the  differential  governing  equations,  several  global 
iterations  are  needed  to  obtain  convergence  and  new 
coefficients  are  calculated  using  the  new  velocity  and 
pressure  fields,  repeating  the  process  until  conver¬ 
gence.  The  nomenclature  “global  iteration”  is  used 
here  to  distinguish  from  the  number  of  iterations  as¬ 
sociated  with  the  solver. 

Because  all  the  conservation  equations  are  solved 
as  part  of  a  single  set,  with  no  decoupling  (or  seg¬ 
regation,  accordingly  to  nomenclature  in  ref.  [11]), 
the  algorithm  can  converge  in  a  small  number  of  it¬ 
erations,  and  for  this  reason  it  has  been  designated 
DIRECTO  [10]  (direct,  in  English). 

3  DISCUSSION  OF  RESULTS 

The  code  development  was  made  using  the  two  clas¬ 
sical  geometries  of  a  two-dimensional  cavity  with  a 
sliding  lid  and  a  sudden  expansion.  In  this  paper  re¬ 
sults  will  be  presented  for  the  two-dimensional  square 
cavity  only. 

This  Section  discusses  3  major  aspects  of  the  algo¬ 
rithm:  the  accuracy,  memory  requirements  and  com¬ 
puting  time,  in  subsections  3.1,  3.2  and  3.3,  respec¬ 
tively. 


3.1  Accuracy 

To  obtain  the  accuracy  of  DIRECTO  we  performed 
simulations  of  the  flow  in  a  two-dimensional  square 
cavity  with  sliding  lid  for  2  Reynolds  numbers,  (400 
and  1000),  and  3  grid  sizes  (64x64,  96x96  and 
128x128).  The  Reynolds  number  definition  was 
Re  =  pUiidH/ti-  Uiid  is  the  lid  velocity  and  H  is 
the  size  of  the  square  cavity. 

The  velocities  were  set  constant  at  every  bound¬ 
ary,  and  zero  normal  gradient  for  the  pressure  was 
used.  This  condition  was  implemented  in  an  im¬ 
plicit  fashion  to  preserve  the  implicit  feature  of  the 
method.  The  calculations  were  stopped  for  residuals 
lower  than  1  x  10“®.  The  residuals  are  the  sum  of  the 
absolute  errors  of  the  algebraic  equations  divided  by 
reference  quantities  and  pUudH  for  momen¬ 

tum  and  continuity  equations,  respectively.  Calcula¬ 
tions  were  all  performed  in  single  precision. 


Method 

Grid 

Urnin 

^max 

DIRECTO  D1 

64 

-0.31999 

-0.43943 

0.29404 

96 

-0.32443 

-0.44721 

0.29897 

128 

-0.32614 

-0.44996 

0.30090 

Exact  value 

-0.32878 

-0.45356 

0.30399 

Accuracy 

1.73 

1.97 

1.69 

DIRECTO  D2 

64 

-0.31956 

-0.43968 

0.29383 

96 

-0.32430 

-0.44729 

0.29893 

128 

-0.32608 

-0.45000 

0.30087 

Exact  value 

-0.32877 

-0.45360 

0.30380 

Accuracy 

1.78 

1.95 

1.77 

CPI 

64 

-0.32368 

-0.44862 

0.29925 

96 

-0.32653 

-0.45163 

0.30183 

128 

-0.32751 

-0.45274 

0.30271 

Exact  value 

-0.32873 

-0.45431 

0.30379 

Accuracy 

2.05 

1.85 

2.07 

SIMPLE 

128 

-0.32614 

-0.45119 

0.30143 

Table  1:  Square  cavity  results  for  Re  =  400  (CPI 
results  from  Deng  et  ai,  1994). 


The  estimated  exact  values  and  order  of  accuracy 
of  the  results  were  estimated  following  the  general¬ 
ization  of  the  Richardson  extrapolation  method.  The 
exact  value  can  be  approximated  in  terms  of  results 
on  finite  grids  plus  the  leading  term  of  the  truncation 
error  as, 

=  <l>ex  +  d-  .  .  .  ,  (16) 

<p2  =  <i>ex  +  R^Xn  -b  .  .  ■  ,  (17) 

<l>3  —  <l>ex  +  h^Xn  -b  .  .  .  ,  (18) 

where  h  is  the  grid  spacing  in  both  directions  and 
is  a  grid  function,  assumed  the  same  for  every  grid 
spacing  (hi,  ^2  and  /13).  Provided  that  h  is  small 
enough  for  the  leading  term  to  be  dominant,  the  or¬ 
der  of  the  numerical  scheme  is  estimated  as  [17], 

ln[A(h2/hi)”-A+l] 

""  ln(h3/hi) 


(19) 
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with, 


A  —  ~ 

<t>2  -  <f>l 


different  velocity  coefficients  in  discretized  momen¬ 
tum  equation  of  CPI  and  DIRECTO  for  control  vol¬ 
umes  near  the  boundaries.  Differences  could  also  oc¬ 
cur  in  the  implementation  of  the  pressure  boundary 


Tables  1  and  2  show  non-dimensioned  values  of 
maximum  and  minimum  V  velocity  {Vmax  and  Vmin) 
in  the  horizontal  centreline  and  minimum  value  of  U 
in  the  vertical  centreline.  The  tables  include  values 
predicted  by  3  different  grids  and  also  the  order  of 
accuracy  and  estimated  exact  value,  obtained  by  the 
Richardson  extrapolation.  Results  are  also  shown  for 
SIMPLE  algorithm  of  Patankar  [1,  2],  CPI  (Consis¬ 
tent  Physical  Interpolation)  and  CSC  (Centred  Stag¬ 
gered  Grid)  methods  of  Deng  et  al.  [17].  D1  and  D2 
are  versions  of  DIRECTO  algorithm  with  the  con¬ 
vective  term  of  equation  (3)  discretized  using  equa¬ 
tions  (4)  and  (5),  respectively. 


Method 

Grid 

Umin 

^min 

^max 

DIRECTO  Dl 

64 

-0.36722 

-0.49426 

0.35565 

96 

-0.37763 

-0.51104 

0.36628 

128 

-0.38198 

-0.51747 

0.37037 

Exact  value 

-0.38510 

-0.52146 

0.37292 

Accuracy 

1.26 

1.38 

1.38 

DIRECTO  D2 

64 

-0.36544 

-0.49382 

0.35414 

96 

-0.37704 

-0.51088 

0.36577 

128 

-0.38177 

-0.51747 

0.37022 

Exact  value 

-0.39003 

-0.52772 

0.37701 

Accuracy 

1.57 

1.73 

1.75 

CPI 

64 

-0.37436 

-0.51015 

0.36364 

96 

-0.38233 

-0.51947 

0.37109 

128 

-0.38511 

-0.52280 

0.37369 

Exact  value 

-0.38867 

-0.52724 

0.37702 

Accuracy 

2.01 

1.94 

2.01 

CSG 

64 

-0.35726 

-0.48858 

0.34556 

96 

-0.37441 

-0.50982 

0.36271 

128 

-0.38050 

-0.51727 

0.36884 

Exact  value 

-0.38855 

-0.52690 

0.37705 

Accuracy 

1.96 

1.94 

1.99 

SIMPLE 

128 

-0.37233 

-0.51014 

0.36234 

Table  2:  Square  cavity  results  for  Re  =  1000  (CPI 
and  CSG  results  from  Deng  et  al,  1994). 


The  SIMPLE  algorithm  [1,  2]  is  implemented  here 
in  a  non-staggered  grid  [18]  [19]  and  uses  the  hy¬ 
brid  finite  difference  scheme,  switching  from  upwind 
to  central  differencing  for  mesh  Reynolds  number 
higher  than  2. 

The  CPI  method  of  Deng  et  al  [17]  is  similar 
to  the  DIRECTO  method.  CPI  uses  a  governing 
differential  equation  for  momentum  and  a  relation- 


conditions;  Deng  et  al  [17]  does  not  state  explicitly 
what  kind  of  boundaries  were  used. 

In  the  CSG  method  of  Deng  et  al  [17]  the 
convection  terms  are  discretized  using  central 
finite  differences  for  all  mesh  Reynolds  numbers 
(pUiAxi/fi)  and  a  staggered  grid  is  used,  to  avoid  ve¬ 
locity  interpolation  when  discretizing  the  mass  con¬ 
servation  equation. 

As  can  be  seen  in  Table  1,  the  estimated  exact 
values  from  Dl,  D2  and  CPI  methods  agree  well  with 
each  other.  The  maximum percentual  error  is  0.17%, 
and  is  found  in  the  Vmin  for  the  Dl  method.  The 
estimated  accuracy  of  Dl  and  D2  is  almost  second 
order,  although  lower  than  CPI  method  (if  based  on 
Ujnin  or  Vmax)- 

In  case  of  Re=1000,  Table  2,  the  decrease  of  accu¬ 
racy  of  the  Dl  method  is  obvious.  This  is  due  to  the 
first-order  upwind  scheme  used  to  discretize  the  con¬ 
vective  term  of  equation  (3),  as  can  be  seen  by  com¬ 
parison  with  the  D2  method,  which  uses  a  second- 
order  accurate  finite  difference  discretization.  Nev¬ 
ertheless  the  estimated  accuracy  of  CPI  is  closer 
to  second  order  compared  with  D2.  The  largest 
difference  between  the  estimated  exact  values  pro¬ 
duced  by  D2  and  CPI  is  0.35%  (based  on  Umin)- 

Compared  to  DIRECTO,  SIMPLE  algorithm  with 
the  hybrid  scheme  requires  finer  grids  to  achieve 
identical  level  of  accuracy,  as  can  be  seen  in  Fig.  2 
and  Table  2.  At  Re=1000,  the  results  of  Dl  and  D2 
methods  for  a  grid  of  96x96  are  much  closer  to  the 
estimated  exact  value  of  the  CPI  method  than  the 
results  of  SIMPLE  with  a  grid  of  128  x  128.  This  lack 
of  accuracy  is  due  to  hybrid  scheme  that  is  first-order 
accurate  for  Peclet  numbers  greater  than  2. 


ship  between  face  and  nodal  values  identical  to  our 
equations  (3)  and  (4),  designated  Dl.  In  CPI  and 
CSG,  the  governing  differential  equations  of  mo¬ 
mentum  are  discretized  at  the  centre  of  the  control 
volumes  before  integration,  while  in  the  DIRECTO 
method  the  equations  are  first  integrated  in  the  con¬ 
trol  volume  and  then  discretized.  This  leads  to 


Figure  2:  Evolution  of  Umin  with  grid  resolution  for 
Re=400. 

Fig.  3  shows  the  streamlines  for  D2  and  SIMPLE 
methods  for  a  Re=1000  and  a  grid  of  96x96.  SIM¬ 
PLE  is  unable  to  predict  the  streamline  distribution 
in  the  centre.  The  dashed  line  (SIMPLE)  at  the 
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centre  of  the  flow  represents  the  the  value  —0.11, 
whereas  the  solid  line  (D2)  represents  —0.115. 


Figure  3:  Stream  function  for  D1  method  ( — )  and 
SIMPLE  method  ( — )  at  Re=1000. 

Given  the  similarities  between  DIRECTO  and  the 
algorithms  CPI  and  CSG  of  Deng  et  al.  [17]  one 
would  expect  higher  accuracy  of  the  DIRECTO  algo¬ 
rithm;  this  is  an  aspect  requiring  further  attention. 

3.2  Memory  Requirements 

Fig  4  shows  the  memory  requirements  for  different 
implementations  of  DIRECTO,  compared  to  SIM¬ 
PLE  using  hybrid  differencing. 

The  coefficient  matrix,  derived  from  a  9-node  star, 
has  a  dimension  of  (NI—2)  x  (N J  —  2)  x  K  by  (IV/  — 
2)  X  (NJ  —  2)  X  K,  where  NI  x  NJ  is  the  problem 
size,  and  K  stands  for  the  number  of  variables  (i.e., 
3  in  case  of  a  two-dimensional  laminar  flow).  This  is 
shown  by  line  a)  in  Fig.  4. 

Because  of  the  block-tridiagonal  structure  it  can 
be  stored  as  a  [{NI  —  2)  x  {N J  —  2)]  x  K  by  2  x 
[{NI  —  2)  X  3  -b  5]  -b  1  matrix  (b)  in  Fig.  4). 

For  finer  grids  the  block  band  structure  becomes 
sparser.  This  was  exploited  by  using  a  sparse  ma¬ 
trix  structure,  storing  only  the  non-zeros  values  on 
a  vector,  the  column  indices  on  an  integer  vector 
and  using  pointers  to  the  beginning  of  each  row. 
This  structure  reduced  the  memory  requirements  to 
[{NI  -  2)  X  {NJ  —  2)  X  A']  X  K  X  9  (line  c)  in  Fig. 

4)- 

On  the  other  hand  SIMPLE  only  requires  5  matri¬ 
ces  of  dimension  {NI  —  2)  by  {N  J  —  2)  to  store  the 
coefficients  (d)  in  Fig.  4). 

3.3  Computing  Time 

The  following  computer  tests  were  run  on  a  DEC  Al- 
phaStation  AXP  3000,  model  600S,  for  sequential  al- 
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Figure  4:  Memory  requirements  of  DIRECTO  (a), 
b)  and  c))  compared  to  SIMPLE  (d))  using  hybrid 
finite  difference  discretization  scheme. 

gorithms  and  a  cluster  of  4  DEC  AlphaStations  AXP 
3000,  models  500S  and  600S,  connected  by  FDDI  and 
Gigaswitch  using  PVM,  for  parallel  versions. 

3.3.1  Linear  solvers 

The  Gaussian  elimination  method  was  used  ini¬ 
tially  [10]  during  the  FORTRAN  implementation  of 
the  algorithm.  The  first  idea  was  to  optimize  the 
Gaussian  elimination  method  by  adapting  it  to  the 
block  band  structure  and  using  BLAS  kernels  and 
LAPACK  library  [20],  on  a  vector  processor  VAX 
6520-2VP.  This  reduced  the  computing  time  but  still 
far  from  the  SIMPLE-bTDMA  method,  and  required 
a  large  amount  of  storage  [13]. 

The  next  stage  was  the  use  of  an  iterative  method 
so  that  the  sparse  structure  could  be  taken  into  ac¬ 
count.  An  iterative  method  has  the  additional  ad¬ 
vantage  of  controlling  the  degree  of  accuracy  for  solv¬ 
ing  the  linear  system  of  equations.  Because  the  solver 
is  an  inner  step  of  a  global  iteration  required  because 
of  nonlinearities,  solving  the  equations  to  a  high  de¬ 
gree  of  accuracy  may  prove  useless. 

Several  methods  were  tested  and  GMRES  (Gen¬ 
eralized  Minimum  Residual)  [14]  was  retained  for 
its  robustness.  GMRES  is  a  Galerkin  type  method 
based  on  an  orthonormal  basis  of  a  Krylov  subspace. 
To  obtain  the  solution  of 

Ax  =  b  , 

of  the  form 

Xfc  =  xo  -b  Zfc  ,  (21) 

where  xq  is  an  initial  solution  with  residual 

To  =  b  —  Axo  .  (22) 

Zk  is  computed  such  that  its  residual  projected  onto 
the  Krylov  subspace  generated  by  ro  is  minimized. 
Iterative  methods  of  this  type  require  the  use  of 
preconditioners  in  order  to  improve  the  convergence 
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rate.  Several  preconditioners  were  tested  and  the 
best  proved  to  be  the  Incomplete  LU  factorization 
of  degree  zero  ILU(O)  and  ILUT  [14,  21].  The  diag¬ 
onal  preconditioner  although  very  simple  to  imple¬ 
ment  did  not  give  as  good  results  as  the  others  [12]. 

Table  3  shows  the  total  CPU  times,  and  cor¬ 
responding  number  of  outer  iterations  needed  to 
achieve  a  residual  of  1  x  10“®,  on  a  DEC  AlphaS- 
tation  AXP  3000  model  600S,  for  several  grid  sizes, 
using  the  DIRECTO-I-GMRES  methods  and  for  the 
SIMPLE-f-TDMA  method  [22].  The  SIMPLE  algo¬ 
rithm  was  used  with  4  sweeps  of  TDMA  for  compu¬ 
tation  of  the  velocities  and  8  for  computation  of  the 
pressure.  It  can  be  seen  in  Table  3  that  the  CPU 


Grid 

SIMPLE 

TDMA 

D/GMRES 

ILU(O) 

D/GMRES 

ILUT 

32 

13.1s  (124) 

8.5s  (19) 

11.7s  (16) 

64 

169.9s  (381) 

99.4s  (20) 

89.2s  (16) 

96 

764.2s  (735) 

608.4s  (39) 

285.2s  (23) 

128 

2292.0s  (1212) 

10689.1s  (91) 

1667.2s  (65) 

Table  3:  CPU  time  and  number  of  iterations  for  a 
residual  of  1  x  10“®. 


times  are  competitive.  ILU(O)  is  a  good  choice  for 
small  grid  sizes  and  ILUT  is  recommended  for  finer 
grids  because  it  keeps  the  number  of  outer  iterations 
low. 

3.3.2  Parallelization 

For  this  type  of  problems  the  parallelization  by  do¬ 
main  decomposition  was  selected.  A  non-overlapping 
domain  decomposition  strategy  was  used,  where  the 
domain  was  decomposed  into  disjoint  subdomains 
separated  by  interfaces.  The  grid  nodes  were  num¬ 
bered  first  inside  each  subdomain  and  then  on  the 
interfaces,  leading  to  a  bordered  block  diagonal  ma¬ 
trix  shown  in  Figs.  5  and  6  [23]. 


Figure  5:  One-way  dissection  ordering. 

The  algorithm  was  parallelized  in  two  versions:  the 
first  using  a  master-slave  approach  where  the  mas¬ 
ter  performs  the  computations  corresponding  to  the 
interface,  and  the  second  using  a  SPMD  (Single  Pro¬ 
gram  Multiple  Data)  approach  where  each  processor 
deals  with  a  subdomain  and  one  interface.  Each  pro¬ 
cessor  had  an  independent  preconditioner. 

Table  4  reports  the  CPU  and  elapsed  times  on  a 
cluster  of  workstations  for  a  64x64  grid  and  3,  4  and 


Figure  6:  Reordered  (one-way  dissection)  matrix. 

5  processes.  There  was  a  reduction  in  (elapsed)  time, 
when  passing  from  3  to  4  processes;  this  is  a  22%  re¬ 
duction  corresponding  to  a  relative  speed-up  of  1.27. 
For  5  processes,  there  is  a  degradation  of  CPU  and 
elapsed  times  because  the  farm  is  composed  only  by 
4  machines,  and  more  than  1  process  will  have  to 
share  the  same  processing  element. 


Processes 

CPU  time 

Master  Slave  (max.) 

Elapsed  time 

3 

192.7s 

2734.3s 

3090.4s 

4 

627.7s 

1780.3s 

2434.5s 

5 

1167.7s 

1217.4s 

3401.5s 

Table  4:  CPU  and  elapsed  time  for  a  64x64  grid  and 
Master-Slave  approach. 


To  be  able  to  use  finer  coarse-grain  parallelism  it  is 
necessary  to  reduce  the  CPU  time  spent  by  the  mas¬ 
ter  to  accompany  the  decreasing  of  the  total  time 
induced  by  the  reduction  of  the  CPU  time  in  the 
slaves.  Based  on  this  need,  another  parallel  version 
of  the  code  was  created,  based  on  a  SPMD  strat¬ 
egy.  Table  5  shows  the  CPU  and  elapsed  times  of 
the  Master-Slave  and  SPMD  approaches  for  3  sub- 
domains  on  a  64x64  grid.  The  SPMD  approach 


CPU  time 

Elapsed  time 

Master-Slave 

1780.3s 

2434.5s 

SPMD 

1568.5s 

1642.3s 

Table  5:  CPU  and  elapsed  times  for  a  64x64  grid  for 
Master-Slave  and  SPMD  approaches. 

is  faster  because  it  is  more  adequate  to  the  sparse 
nature  of  the  problem.  Furthermore,  for  identical 
number  of  subdomains,  the  SPMD  approach  uses 
one  process  less  than  the  Master-Slave.  However 
the  implementation  of  the  SPMD  approach  is  more 
complex,  and  given  the  reduced  number  of  worksta¬ 
tions  available  running  PVM,  we  cannot  conclude  yet 
which  of  these  approaches  is  the  most  appropriate. 

4  CONCLUSIONS 

The  present  article  reported  on  further  developments 
of  an  implicit  coupled  algorithm  for  fluid  flow  equa- 
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tions.  The  main  conclusions  of  this  study  are  the 
following. 

1.  Computing  time  reduction  was  achieved  by 
moving  from  a  direct  to  an  iterative  solver  based 
on  GMRES. 

2.  It  was  shown  that  DIRECTO  +  GMRES  with 
ILU(O)  and  ILUT  preconditioners  is  always 
faster  than  SIMPLE-fTDMA  (4  and  8  sweeps 
for  velocities  and  pressure,  respectively).  ILU(O) 
is  a  good  preconditioner  for  coarse  grids  and 
ILUT  is  better  for  finer  grids,  because  it  keeps 
the  number  of  outer  iterations  small. 

3.  Reduction  of  memory  storage  was  also  achieved 
by  taking  advantage  of  the  sparse  nature  of  the 
coefficient  matrix.  However,  memory  require¬ 
ments  are  still  large  compared  to  SIMPLE  and 
this  is  an  aspect  calling  for  further  investiga¬ 
tion.  Efforts  were  made  to  overcome  this  disad¬ 
vantage  by  reverting  to  domain  decomposition 
techniques,  at  the  expense  of  increased  comput¬ 
ing  time. 
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1.  SUMMARY 

This  paper  describes  work  done  at  Rockwell  Science  Center 
on  the  development  and  application  of  computational  fluid 
dynamics  (CFD)  solvers  for  unstructured  grids.  A 
description  of  the  use  of  “interior  boundary”  conditions  in 
simulating  moving  bodies  is  also  presented. 

2.  INTRODUCTION 

The  CFD  group  at  Rockwell  Science  Center  has  been 
involved  over  the  past  fifteen  years  in  the  development  and 
application  of  numerical  techniques  for  the  simulation  of 
flow  past  complex  aerodynamic  shapes.  Starting  with 
small  perturbation  equations,  codes  have  been  developed  to 
solve  more  and  more  complex  governing  equations  on 
structured  grids  (Ref.  1-4).  The  latest  version  of  the 
structured  grid  code  solves  Reynolds  Averaged  Navier- 
Stokes  (RANS)  equations  in  generalized  curvilinear 
coordinates.  It  includes  the  ability  to  simulate  reacting 
multispecies  flows  (Ref.  5).  Simulations  requiring  grid 
movements  are  handled  quite  elegantly  using  this  code 
(Ref.  6).  CFD  codes  developed  at  Rockwell  Science  Center 
have  played  a  significant  role  in  several  national  projects 
including  the  Space  Shuttle,  B-IB  and  National  AeroSpace 
Plane  (NASP)  projects. 

Time  required  for  performing  accurate  numerical  simulation 
of  complex  fluid-dynamics  problems  is  still  sufficiently 
large  to  discourage  designers  from  including  CFD 
techniques  in  the  design  cycle.  Total  time  required  for  a 
numerical  simulation  consists  of  the  time  required  for 

a)  preprocessing,  which  consists  of  modifying  the  CAD 
geometry  to  a  form  suitable  for  numerical  simulation, 
(in  the  case  of  structured  grids)  dividing  the 
computational  domain  into  zones,  choosing  proper 
grid  resolution  at  the  boundaries  and  finally  grid 
generation, 

b)  solver, 
and 

c)  post-processing,  which  consists  of  extracting 
physical  quantities  like  skin-friction  and  heat- 
transfer,  from  the  numerical  solution;  and 
visualization  of  the  solution. 

Several  years  of  research  in  structured-grid  simulations  and 
developments  in  computer  software  and  hardware 
technologies  have  considerably  redueed  the  turnaround  time 
for  numerical  solutions.  Still  the  time  required  to  simulate 
flow  past  complex  geometries  is  unacceptably 


large.  Especially,  the  time  required  for  preprocessing 
increases  almost  exponentially  as  more  and  more  details  of 
the  geometry  are  included  in  the  simulation.  For  example, 
in  the  case  of  the  multibody  space  shuttle  configuration 
(Ref.  7),  several  months  were  needed  to  generate  a 
structured  grid  when  the  fidelity  requirements  for  the  model 
employed  in  the  numerical  simulation  were  increased 
considerably. 

Unstructured  grid  methodologies  appear  to  be  very 
promising,  since  the  preprocessing  time  could  be  orders  of 
magnitude  less  than  that  required  for  structured  grids.  It  is 
indeed  the  case  for  inviscid  flows.  But,  our  experience  with 
unstructured  grid  computations  has  opened  our  eyes  to 
several  issues  involved  in  such  simulations.  We  propose  to 
discuss  some  of  those  issues  in  this  paper. 

Research  on  the  development  of  unstmctured  grid  solvers 
for  Computational  Fluid  Dynamics  (CFD)  and 
Computational  ElectroMagnetics  (CEM)  has  been  in 
progress  at  the  Rockwell  Science  Center  for  the  past 
several  years.  An  unstructured  grid  solver  for  CFD,  called 
UNIV,  that  can  handle  tetrahedral,  triangular  prizmatic  and 
hexahedral  cells,  has  been  developed  (Ref  8).  UNTV 
employs  a  finite-element-like  formulation  that  uses 
piecewise  polynomial  interpolation  for  the  dependent 
variables.  The  dependent  variables  are  the  cell  averages  of 
internal  energy,  mass,  x-,  y-,  and  z-momenta. 

Interpolating  polynomials  may  be  discontinuous  across 
cell  boundaries.  An  approximate  Riemann  solver  is  used  to 
resolve  discontinuities  at  cell  boundaries.  The  domain  of  a 
dependent  variable  polynomial  is  restricted  to  a  cell.  The 
discretization  of  the  governing  equations  is  constructed 
directly  from  the  integral  form  of  the  conservation  laws. 
No  variational  principle  or  method  of  weighted  residuals  or 
other  indirect  approach  is  employed.  The  code  has  the 
option  to  use  either  a  least-square  polynomial  or  a  ENO 
(Essentially  Non-Oscillatory)  reconstruction. 

Reconstruction  is  the  process  of  constructing  an 
interpolating  function  for  a  cell  that  satisfies  the  cell 
average.  Please  see  Ref  9.  for  details  on  ENO  schemes. 

Numerical  formulation  employed  in  UNIV  and  a  new 
approach  for  simulating  bodies  in  relative  motion  are 
discussed  in  the  following  sections.  A  generalized  Lax- 
Wendroff  scheme  for  Euler  equations  adapted  from  CEM  i  s 
also  presented.  A  pointwise  turbulence  model  that  is  highly 
suitable  for  unstructured  grids  is  discussed.  Lessons  learned 
from  our  experience  with  unstructured  grid  computations  are 
elucidated. 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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3.  NUMERICAL  FORMULATION 

Two  different  approaches  to  solving  the  initial-Zboundary- 
value  problem  (IBVP)  for  general  hyperbolic  system  of 
conservation  laws  in  the  “conservation-law  form” 
represented  by 


3q  0fi  9f2  3f3 

— -I- —  +  —  -I-  — 

dt  3x  3y  3z 


(1) 


have  been  developed.  Equation  (1)  is  satisfied  at  all  {x,  y,  z) 
belonging  to  domain  D  with  prescribed  initial  and 
boundary  values  for  the  dependent  (conserved)  variable 
vector  q.  Here,  the  Cartesian  coordinate  directions 
(independent  variables)  are  x,  y,  and  z-  The  components  of 
flux  tensor  in  the  three  coordinate  directions  are  the  vectors 
//,  /2  and/j.  In  both  approaches  the  domain  D  is  divided 
into  several  cells,  and  the  integral  form  of  the  conservation 
equations  in  each  cell  given  by 

-(qv)  +  J  Js(f  ")ds  =  o  (2) 

3t 


is  solved  with  prescribed  initial  values  for  q 
qo  =  q(x,y,z,t„) 

and  relevant  boundary  conditions.  Here,  q  denotes  the  cell 
average  of  the  dependent  variables; 

n  =  n^j  +  V  +  V 

is  the  outward  unit  normal  at  any  point  on  the  boundary 
surface  5  of  a  cell;  j  ,  k  ,  and  1  are  the  unit  vectors  in  x,  y 

and  z  directions  respectively;  V  is  the  cell  volume  and  F 
is  the  tensor  of  fluxes  with  (/;,  f2,  /?)  as  components. 
Stated  in  words,  Eqn.  2  implies  that  the  rate  of  increase  of  a 
conserved  quantity  (qV)  inside  a  cell  is  given  by  the  net 
inflow  (flux)  of  that  quantity  into  the  cell.  Therefore,  as  in 
the  case  of  cell-centered  finite-volume  structured  grid 
solvers  (Ref.  3),  solving  the  governing  equations  requires 
evaluation  of  surface  integrals  from  known  values  of  cell 
averages. 

Surface  integrals  are  evaluated  using  numerical  quadrature 
formulas.  In  this  method  an  integral  is  written  as  the 
weighted  sum  of  the  integrant  evaluated  at  the  quadrature 
points.  The  location  and  weights  of  quadrature  points  are  so 
chosen  as  to  give  the  best  possible  approximation  for  the 
integral.  Higher  order  schemes  require  larger  number  of 
quadrature  points.  Choosing  the  centroid  of  a  surface  as  the 
quadrature  point  yields  second  order  accuracy.  Since  only 
the  cell-averages  of  the  dependent  variables  are  known,  we 
need  to  develop  a  procedure  for  evaluating  the  dependent 
variables  at  the  quadrature  points  in  order  to  compute  the 
surface  integrals  (fluxes).  The  spatial  accuracy  of  the 
numerical  scheme  is  determined  by  the  accuracy  of  this 
“reconstruction”  procedure.  The  dependent  variable  vector  q 
at  a  quadrature  point  may  not  be  uniquely  specified,  since 
the  point  belongs  to  two  neighboring  cells  with  different 
polynomial  representations.  If  the  two  vectors  evaluated  at 
a  quadrature  point  using  the  polynomial  reconstruction  in 
the  two  “containing”  cells  are  q^  and  qj^  (Fig.  1),  then  a 
unique  value  q*  is  determined  from  the  solution  of  a  locally 
one-dimensional  Riemann  problem  with  qi  and  q^  as  the 
“left”  and  “right”  values.  An  approximate  Riemann  solver 


suggested  by  Roe  (Ref.  10)  is  employed  for  this  purpose  in 
the  UNIVERSE-series  of  codes  of  which  UNIV  is  a  member. 

The  two  approaches  alluded  to  at  the  beginning  of  this 
section  differ  in  their  “reconstruction”  procedure  and  also 
in  the  time-stepping  scheme.  Only  explicit  time-stepping 
schemes  are  considered  in  both  approaches.  Both 
approaches  permit  use  of  multiple  quadrature  points  and 
curved  surfaces  for  higher  accuracy.  Codes  developed  using 
these  approaches  can  handle  hexahedral,  tetrahedral  and 
triangular-  prismatic  cells. 

Qr  =  P‘=  (ff,t) 


SC.1654E.091195 

Fig.  1  “Left”  and  “Right”  states  for  locally  one¬ 
dimensional  Riemann  problem. 


3.1  The  First  Approach;  a  Finite-element  Like 
Algorithm 

The  major  credit  for  this  work  goes  to  Dr.  Chakravarthy. 
This  approach  employs  a  unified  treatment  for  stmctured 
and  unstructured  grids.  The  codes  developed  using  this 
formulation  are  called  UNIVERSE-series  of  codes.  The 
UNIVERSE-series  includes  “least-square”  and  “ENO”  (Ref. 
9)  reconstruction  options.  Both  these  procedures  involve 
development  of  an  interpolating  polynomial  P^(x,y,z)  for 
each  of  the  conserved  quantities,  where 

np 

P''(x,y,z)  =  ^  pF  x'f'*  y'‘*'*  (3) 

i=() 


where,  pC  are  the  coefficients  of  the  polynomial.  P^  is 

applicable  only  within  a  given  cell  C.  Integral  of  P^  over  C 
reproduces  the  corresponding  cell  average.  That  is. 


J-llJp  P^dV  =  Y,  a9pC  =  qC 
where, 

‘  lllcdV 


(4) 


(5) 


The  spatial  accuracy  of  the  numerical  scheme  is  determined 
by  the  form  of  P^.  A  linear  polynomial  in  x,  y  and  z 
results  in  second-order  accuracy  while  a  quadratic 
polynomial  yields  a  third-  order  scheme.  Linear 
polynomial  requires  evaluation  of  4  coefficients,  while  the 
quadratic  polynomial  requires  10.  In  the  case  of  the  “least- 
square”  option,  the  polynomial  coefficients  are  computed 
such  that  the  integral  of  P*-  over  cell  C  reproduces  the 
corresponding  cell  average  values  (Eqn.  4),  and  the 
integrals  over  the  neighboring  cells  satisfy  the 
corresponding  cell  averages  in  a  least-square  sense.  That  is. 
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- E  =  0 

apF 

for  0  <  i  <  np.  The  error  term  E  is  given  by, 
-  ^2 


E  =  i,fZ 

n=l  \  i  ) 


(2)  Proximity  neighbors  (PN) 

(6) 

This  latter  type  is  defined  in  terms  of  distance  from  a  given 
cell. 

A  neighborhood  is  now  defined  to  be  a  collection  of 
neighboring  cells.  A  neighborhood  hierarchy  is  defined  as 
follows: 


where  n  refers  to  a  neighboring  cell,  nc  is  the  number  of 
cells  in  the  neighborhood  of  cell  C,  excluding  C  itself. 
Obviously,  a  least-square  approximation  for  P^  can  be 
constructed  only  if 

nc  >  np  (8) 


H®  is  the  cell  itself 

H’  is  the  cell  and  its  neighbors. 

is  the  union  of  H'  and  the  neighbors  of  all  the 

cells  in 


Therefore,  the  neighborhood  of  a  cell  should  be  properly 
defined  to  satisfy  equation  (8).  The  UNIVERSE-series  CFD 
formulation  defines  a  “neighbor”  of  a  given  cell  in  a  very 
flexible  and  useful  way. 

First,  we  consider  two  types  of  cell  connectivities  (Fig.  2): 

(1)  Node-aligned  cells  (NAC) 

(2)  Surface-aligned  cells  (SAC) 


This  process  may  be  continued  recursively,  and  depending 
on  the  order  of  P^,  a  neighborhood  may  be  found  such  that 
equation  (8)  is  satisfied. 

In  the  case  of  ENO  (Essentially  NonOscillatory) 
reconstruction,  we  seek  to  obtain  a  “best”  polynomial 
rather  than  a  “least-squares”  one.  The  “best”  polynomial 
corresponds  to  the  “smoothest”.  As  always,  the  equation 
for  cell  C  must  be  satisfied  (Eqn.  4).  From  the  remaining  nc 
equations,  we  can  select  any  combination  of  np  equations 
and  solve  the  resulting  set  of  np  +  1  equations.  There  are 


Node  Aligned  Cells 
(NAC) 


f"M  (9) 

\npl 

such  combinations.  The  combination  that  yields  the  best 
polynomial  in  terms  of  its  ENO  property  is  to  be  preferred. 
For  example,  when  the  flow  field  contains  a  single  shock 
wave,  the  neighbors  selected  should  lie  on  the  same  side  of 
the  shock  as  cell  C.  This  approach  may  be  termed  the  “best 
stencil”  formulation  and  has  been  applied  very  successfully 
in  various  forms  to  structured  grid  ENO  formulations. 
Reference  9  contains  many  different  strategies  for  this 
task.  Note  that  the  “least  squares”  strategy  may  result  in  a 
stencil  that  includes  cells  from  both  sides  of  a 
discontinuity  and  hence  not  desirable. 
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Surface  Aligned  Cells 
(SAC) 


Fig.  2  Node-aligned  and  surface-aligned  cells. 


Alternatively,  a  “best  term”  strategy  has  also  been  tried 
out.  In  this  formulation,  the  least-squares  polynomials  are 

first  determined  for  all  cells.  Each  coefficient  p9  of  the 

1 

polynomial  (Eqn.  3)  in  a  given  cell  C  corresponds  to  the 
appropriate  derivative  of  the  polynomial  (up  to  a  constant 
coefficient)  evaluated  at  the  centroid  of  the  cell.  That  is. 


[  3P^ 


(10) 


Next,  we  consider  different  types  of  neighbors: 
(1)  Touching  neighbors  (TN) 

These  include 


where  the  subscript  c  refers  to  the  centroid  and  K’  is  a 
constant.  In  the  case  of  the  “best  term”  strategy,  we  replace 
each  pc  by  that  computed  from  the  corresponding 

derivative  at  the  cell  centroid  evaluated  from  a  neighboring 
cell  polynomial  pP ,  provided 


(la)  Common-node  neighbor  (CNN) 

(lb)  Common-face  neighbor  (CFN) 

(lc)  Touching-face  neighbor  (TFN) 


|pP  I  <  a  pC  (1 1) 

111  111 

where  a  >  1.  In  other  words,  pCis  selected  such  that  the 

corresponding  derivative  at  the  centroid  does  not  differ 
“too  much”  from  its  value  in  a  neighborhood.  This 
procedure  attempts  to  construct  a  reconstruction 
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polynomial  that  uses  only  neighboring  cells  on  the  same 
side  of  a  discontinuity. 

For  one-dimensional  shock-tube  problems,  it  has  often 
been  demonstrated  that  it  is  better  to  select  the  best 
stencils  based  on  comparing  interpolates  of  local 
characteristic  variables  and  not  the  conserved  dependent 
variables.  However,  within  the  context  of  unstructured  grid 
formulations  this  approach  is  very  expensive,  and 
consideration  of  such  issues  is  postponed  for  future  work. 

We  have  so  far  discussed  the  spatial  discretization  problem. 
As  far  as  temporal  discretization  is  concerned,  only 
explicit  schemes  have  been  considered.  A  second-order 
time-accurate  formulation  is  given  below  as  an  example. 
This  is  fashioned  after  Heun’s  method  or  the  second-order 
Runge-Kutta  method  (RK2).  The  RK4  method  can  be 
implemented  in  similar  fashion.  Higher  than  second-order 
spatial  accuracy  results  in  reduced  numerical  dissipation, 
and  this  sometimes  necessitates  the  use  of  the  fourth-order 
Runge-Kutta  formulation,  which  has  a  larger  stability  range 
than  the  second-order  Runge-Kutta  method. 

In  semidiscrete  form,  the  equations  to  be  solved  are 

^qV)=RHS(q,t)  (12) 

d 

where  RHS(q,t)  is  the  net  flux.  The  corresponding  time 
stepping  method  can  be  written  as 

(qV)^  =(qV)"+RHS(q",  t")At 

(qV)""^^  =l[(qV)"+(qV)’+AtRHS  (q\  t"^S] 

2 

The  fourth-order  accurate  Runge-Kutta  scheme  can  be 
written  as 

ki=RHS  (q",  t") 

(5V)^  =(qV)"  +  ^kj 
2 

k2  =RHS(q\  t^  +  y) 

(qV)^=(qV)"  +  ^k2 

ko=RHS(q^,  t"  +  — ) 

j  2 

(qV)^=  (qV)"  +  Atk3 

.  ^3  n-rk 

k4  =  RHS(q  ,t  ) 

(qV)"'"*  =  (qV)"  +  ^(kj-r21^+21^+l^) 

In  the  above,  the  explicit  dependence  of  RHS  on  t  is  useful 
for  time-dependent  problems  where  the  boundary 
conditions  or  other  behavior  explicitly  depend  on  time. 


3.2  The  Second  Approach;  Generalized  Lax- 
Wendroff  Scheme 


This  numerical  scheme  was  originally  developed  under  the 
leadership  of  Dr.  Shankar  (Ref.  11)  for  solving  Maxwell’s 
equations  and  later  was  adapted  for  Euler  equations.  This 
approach  employs  a  multilevel  time  stepping  scheme.  The 
second-order  scheme  uses  a  two  time-level  discretization. 
The  first  fractional  time-step  employs  first-order  spatial  (qi 
and  are  set  equal  to  the  corresponding  q")  and  temporal 

discretizations  to  compute  q"'*'’^^.  Here,  the  superscript  n 
refers  to  the  time-level  n.  For  the  second  time-step,  qi  and 
qn  are  computed  from  the  corresponding  centroidal  values 
of  q"  andV^  as 

qL(orqR)  =  q'’  +  (rf-rc).  Vq"  (15) 

where  Tf  and  Tc  refer  to  the  position  vectors  of  the 
centroids  of  the  surface  and  cell,  respectively,  and 

nq*dS  (16) 

where  17*  is  obtained  from  the  Roe’s  approximate  Riemann 
solver  with  qi  and  q^  set  equal  to  qn.  The  algorithm 
described  above  may  be  considered  as  a  generalization  of 
Lax-Wendroff  upwind  integration,  since  it  reduces  to  the 
Lax-Wendroff  scheme  for  uniform  rectangular  hexahedral 
cells.  Note  that  only  details  of  the  second  order  scheme  are 
presented,  and  that  extension  to  higher  order  schemes  is 
indeed  straight-forward,  albeit  tedious. 

3.3  Computation  of  Viscous  Fluxes 

Viscous  fluxes  at  a  quadramre  point  on  a  cell  face  are 
computed  as  the  mean  of  the  corresponding  contributions 
from  the  two  adjacent  cells  that  share  the  face.  That  is,  the 
average  of  the  derivatives  computed  from  the  polynomial 
reconstruction  in  the  two  adjacent  cells  are  employed  in  the 
calculation.  When  the  quadrature  point  lies  on  the  boundary 
of  the  computational  domain,  the  polynomial 
reconstruction  employed  is  centered  about  the  quadrature 
point. 


4.  BOUNDARY  CONDITIONS 

The  implementation  of  boundary  conditions  ensures 
consistency  in  flux  computations.  That  is,  just  like  in  the 
case  of  any  interior  cell  boundary,  computation  of  fluxes 
for  a  cell  boundary  that  lies  on  the  boundary  of  the 
computational  domain  involves  determination  of  “left”  and 
“right”  states  and  Roe’s  approximate  Riemann  solver.  The 
state  that  corresponds  to  the  “outside”  of  the  domain  should 
satisfy  the  appropriate  boundary  conditions.  For  instance, 
when  computing  fluxes  for  a  cell  on  the  left  boundary  of  the 
domain  where  inviscid  tangency  condition  is  to  be 
satisfied,  the  “left”  state  should  be  such  that  the 
corresponding  velocity  vector  should  be  tangential  to  the 
surface.  This  manner  of  imposing  boundary  conditions 
ensures  that  only  the  information  at  a  boundary  that 
corresponds  to  waves  propagating  in  to  the  computational 
domain  is  actually  used  in  the  computation  of  fluxes. 


4.1  Interior  Boundary  Condition 

The  concept  of  boundary  conditions  has  been  generalized 
to  include  specification  of  boundary  conditions  anywhere 
in  the  computational  domain  (Ref.  12).  The  part  of  the 
boundary  condition  that  does  not  correspond  to  the  actual 
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boundary  of  the  computational  grid  is  referred  to  as 
“interior  boundary  conditions.”  In  this  case  the  user 
specifies,  among  several  attributes,  the  coordinate  location 
of  each  boundary  point  as  well  as  a  vector  normal 
associated  with  the  point.  The  need  for  the  normal  arises 
from  the  fact  that  even  though  interior  boundary  points  are 
specifiable  as  individual  points,  they  arise  from  boundary 
surfaces  that  they  are  a  part  of.  It  is  the  surface  normal 
along  with  its  location  that  describes  the  local  geometry. 
Note  that  the  surface  in  question  could  very  well  be  a  surface 
of  discontinuity  (a  shock  wave),  and  it  may  not  be  possible 
to  assign  unique  values  for  the  dependent  variables  at  the 
corresponding  boundary  point.  To  account  for  such  a 
situation,  for  every  interior  boundary  location  identified  by 
the  user,  two  interior  boundary  points  are  created  and  added 
to  the  data  base  of  the  UNIV  flow  solver.  One  of  the  added 
points  has  the  normal  pointing  one  way  and  the  second 
point  the  other  way  (Fig.  3). 


A 


Fig.  3  Interior  boundary  points. 

The  user-specified  boundary  condition  is  applied  to  each 
pair  of  interior  boundary  points.  Certain  boundary 
conditions  such  as  surface  tangency  are  applied 
individually  to  both  points  of  the  pair;  i.e.,  they  are 
applied  in  a  decoupled  fashion.  Certain  boundary 
conditions  such  as  those  associated  with  “shock  fitting”  or 
“contact-surface  fitting”  are  applied  in  a  coupled  fashion. 
For  example,  the  values  on  the  supersonic  side  of  the  shock 
are  accepted  as  is,  and  the  values  on  the  subsonic  side  are 
computed  (along  with  the  shock  speed  value)  by  accepting 
only  the  pressure  from  the  subsonic  side  and  applying  the 
Rankine-Hugoniot  shock-jump  relations.  The  availability 
of  the  boundary  points  in  pairs  facilitates  such 
transactions. 

The  process  of  computing  the  “left”  and  “right”  states  i  s 
modified  when  a  cell  has  an  interior  boundary  point.  The 
contribution  of  each  of  the  boundary  points  to  the 
quadrature  points  is  computed  using  the  proportion  of  the 
surface  area  that  is  in  the  region  of  influence  of  the 
boundary  point.  In  Fig.  3,  the  face  AB  is  completely  in  the 
region  of  influence  of  boundary  point  1,  whereas  face  CA 
gets  contributions  from  both  the  boundary  points.  Note 
that  the  boundary  points  actually  differ  only  in  the  normals 
associated  with  them,  and  their  coordinates  are  identical. 
For  the  sake  of  clarity,  they  are  shown  as  two  different 
points  in  Fig.  3. 

As  part  of  the  infrastructure  necessary  to  implement 
interior  boundary  point  treatment,  one  needs  the  ability  to 


associate  with  each  (pair  of)  interior  boundary  point  the 
cell  that  contains  it.  This  chore  of  searching  through  the 
mesh  to  determine  the  one  cell  that  contains  the  boundary 
point  is  efficiently  accomplished  in  the  UNTV  flow  solver 
using  an  “octree”  sort  and  search  procedure.  Given  an 
interior  boundary  point,  an  octree  search  of  the  sorted  list 
of  node  points  of  the  mesh  quickly  yields  the  nearest  mesh 
node.  All  cells  that  contain  the  node  as  well  as  the 
common-node  neighbors  of  this  set  of  cells  are  searched,  in 
that  order,  to  determine  if  the  given  point  is  in  any  of  those 
cells.  If  not,  the  “nearest”  cell  is  identified. 

In  the  previous  paragraphs,  it  was  convenient  to  describe 
the  procedure  as  if  the  user  provides  pointwise  information 
related  to  interior  boundaries.  Depending  on  the  relative 
fineness  or  coarseness  of  the  geometry  description  of  the 
interior  surface  with  respect  to  the  surrounding  mesh,  there 
may  be  two  or  more  user-specified  (before  the  flow  solver 
replaces  each  user-specified  point  with  two  points,  with  the 
normals  facing  in  opposite  directions)  interior  boundary 
points  in  a  cell,  or  there  may  be  none  (Fig.  4).  In  Fig.  4  the 
cells  1,5,8  and  10  have  two  or  more  interior  boundary 
points  while  cells  4,7  and  9  have  none.  The  case  of 
multiple  interior  points  in  a  cell  can  be  dealt  with  easily 
(e.g.,  by  replacing  them  with  an  equivalent  single  point,  if 
necessary).  But,  the  case  of  no  interior  point  in  a  cell  that 
actually  straddles  the  interior  boundary  is  not  acceptable. 
To  avoid  such  problems,  we  start  with  the  user  describing 
the  interior  surface  as  an  unstructured  grid  (triangular 
elements).  Using  an  octree-based  sort  and  search  procedure, 
the  intersection  of  the  mesh  with  this  surface  is  identified 
(Fig.  5).  Interior  boundary  points  are  assigned  to  each  such 
intersection.  There  could  be  interior  surface  geometry 
elements  that  do  not  participate  in  such  intersections.  The 
centroids  of  these  elements  are  optionally  added  to  the  list 
of  interior  boundary  conditions. 
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Fig.  4  An  example  of  user  specified  interior  boundary 


points. 
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Fig.  5  An  example  of  UNIV  generated  interior  boundary 


points. 


5.  GRID  GENERATION 

The  UNIVERSE-series  of  codes  includes  an  unstructured  grid 
generator,  named  UNIVG.  UNTVG  accepts  specification  of 
surface  geometry  in  the  form  of  a  collection  of  patches.  A 
patch  geometry  could  be  specified  either  in  the  IGES  format 
or  by  specifying  sufficient  number  of  non-intersecting 
lines  on  the  patch.  Each  line  in  turn  is  discretized  by  an 
ordered  collection  of  sufficient  number  of  points. 
Triangular  elements  are  first  generated  on  the  boundary  of 
the  computational  domain  satisfying  user  specified 
clustering  requirements.  The  computational  domain  is  then 
discretized  in  the  form  of  tetrahedral  cells  using  the 
“advancing  front”  technique  (Ref.  8). 

The  method  of  “advancing  front”  does  not  possess  a  good 
mechanism  for  controlling  the  distribution  of  cells  in  the 
computational  domain,  and  often  regions  with  large 
variations  in  cell  sizes  and  shapes  are  encountered.  Such 
regions  deteriorate  the  fidelity  of  numerical  simulation.  To 
overcome  this  problem,  grid  smoothing  strategies  based 
on  constrained  optimization  techniques  have  been 
developed  and  employed  successfully  (Ref.  1 3). 

UNIVG  has  the  capability  to  develop  an  unstructured  grid 
that  includes  a  specified  “cloud”  of  points  as  nodes.  This  is 
sometime  useful  in  controlling  the  distribution  of  cells  in 
the  computational  domain. 

6.  STORE  SEPARATION 

The  concept  of  interior  boundary  conditions,  described  in 
section  4.1,  is  used  to  simplify  numerical  simulation  of  the 
store  separation  problem.  The  process  starts  by  generating 
an  unstructured  mesh  for  the  parent  vehicle.  The  store 
geometry  is  discretized  by  generating  an  unstructured 
surface  mesh.  The  intersection  of  the  store  surface  mesh 
with  the  unstructured  volume  cells  of  the  parent  grid  is 
determined  by  using  an  octree  sort/search  procedure. 
Centroids  of  the  intersecting  surfaces  and  their  normals  are 
computed.  A  data  base  consisting  of  these  centroids  and 
normals  is  thus  generated  and  used  as  input  by  the  interior 
boundary  condition  routines.  Note  that  the  boundary 
condition  for  the  store  accounts  for  its  initial  motion.  The 
governing  equations  are  then  solved  with  appropriate 
boundary  conditions  to  obtain  solution  for  the  next  time 
level.  Aerodynamic  forces  and  moments  for  the  store  are 
computed,  and  the  equations  for  the  conservation  of  linear 


and  angular  momenta  are  solved  to  obtain  the  location  and 
orientation  of  the  store  at  the  next  time  level.  This  process 
also  yields  velocity  vectors  at  all  points  on  the  store.  The 
intersection  of  the  store  geometry  in  its  new  location  with 
the  unstructured  volume  mesh  is  determined,  and  the 
process  is  repeated  for  all  subsequent  time  steps.  This 
approach  is  not  suitable  for  viscous  flows,  since  the  parent 
vehicle  grid  would  be  too  coarse  to  resolve  viscous  regions 
when  the  store  moves  away  from  the  parent  vehicle.  This 
problem  may  be  circumvented  by  adapting  the  mesh  as  the 
store  moves,  but  at  present  such  a  strategy  does  not  appear 
attractive  due  to  the  large  amount  of  work  involved  in 
adapting  the  mesh  and  performing  required  interpolations 
that  could  result  in  loss  of  accuracy. 


7.  TURBULENCE  MODELING 

Until  recently  all  the  turbulence  models  employed  in 
numerical  simulations  required  the  knowledge  of  the  normal 
distances  of  a  point  from  surrounding  walls.  This 
information  is  very  difficult  to  obtain  in  the  case  of 
unstructured  grids.  In  the  case  of  structured  grids,  mostly 
distances  along  grid  lines  were  employed.  This  was 
sufficient  since  the  grid  lines  were  nearly  orthogonal  in  the 
vicinity  of  a  body  where  viscous  effects  are  dominant.  But 
when  complex  geometries  requiring  a  multizone  grid 
topology  were  encountered,  it  became  difficult  to  maintain 
continuity  of  eddy  viscosity  at  zonal  interfaces.  To 
circumvent  this  problem,  a  pointwise  turbulence  model  that 
does  not  require  any  information  regarding  the  distance  of  a 
point  from  surrounding  walls  was  developed  at  Rockwell 
Science  Center  by  Goldberg  and  Ramakrishnan  (Ref.  14). 
Since  then,  several  such  models  have  been  developed,  and 
reliable  computation  of  turbulent  flows  on  structured  grids 
has  become  a  possibility. 


8.  LESSONS  LEARNED 

One  of  the  most  important  lessons  that  we  have  learned 
from  our  own  experience  and  the  experience  of  our  peers  i  n 
the  CFD  and  CEM  community  with  unstructured  grid 
computations  is  that  in  spite  of  all  the  advances  that  have 
been  made  in  this  field  so  far,  on  comparable  grids 
structured-grid  simulations  yield  more  accurate  solutions. 
For  complex  geometries,  it  is  indeed  possible  to  speed  up 
the  preprocessing  stage  of  a  numerical  simulation  by  an 
order  of  magnitude  by  employing  unstructured  grids.  On  the 
other  hand,  a  structured  grid  computation  requires  less  CPU 
time  and  memory  and  converges  in  fewer  number  of  time 
steps.  For  example,  in  the  case  of  an  inviscid  flow  past  a 
sphere  with  M„  =  0.5,  it  has  not  been  possible  to  obtain 
even  two  orders  of  magnitude  drop  in  the  L-2  norm  for  the 
net-flux  vector  in  reasonable  number  of  time  steps  (less 
than  1 000)  for  an  unstructured  grid,  while  a  structured  grid 
computation  on  a  comparable  grid  converges  by  about  four 
orders  of  magnitude  in  less  than  600  time  steps.  Attempts 
have  been  made  to  develop  implicit  schemes  for 
unstructured  grids,  but  in  our  opinion,  a  structured  grid  still 
performs  better  as  far  as  convergence  and  accuracy  are 
concerned. 

During  the  design  phase  of  an  aerospace  configuration, 
several  possible  candidates  are  evaluated,  and  a  small 
number  of  viable  candidates  is  down  selected  from  the 
original  pool  for  further  considerations.  This  process  is 
usually  carried  out  using  a  relatively  low-level  CFD 
analysis  requiring  less  stringent  accuracy  and  convergence 
criteria.  Mostly  only  inviscid  flows  are  considered. 
Structured  grids  are  not  suitable  for  this  purpose,  since  the 
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preprocessing  takes  an  unacceptably  long  time. 
Unstructured  Euler  solvers  offer  the  most  viable  solution. 
Since  Euler  equations,  unlike  Navier-Stokes  equations,  do 
not  require  very  fine  grids  in  the  vicinity  of  solid  bodies, 
unstructured  grid  development  becomes  much  easier  to 
handle,  and  several  solutions  for  many  different 
configurations  can  be  carried  out  in  a  matter  of  a  few  weeks. 
This  was  indeed  demonstrated  in  the  case  of  some 
modifications  that  were  carried  out  for  B-IB  bomber. 
Starting  with  the  geometry  of  the  aircraft  in  IGES  format, 
an  Euler  solution  was  obtained  for  this  complex 
configuration  (Fig.  6)  in  about  five  working  days.  With  the 
use  of  Massively  Parallel  Processing  (MPP)  computers, 
this  process  may  be  accelerated  even  more.  From  this  point 
of  view,  unstructured  grid  solvers  have  a  clear  edge  over 
their  structured  grid  counterparts. 


Fig.  6  Unstructured  grid  for  inviscid  flow  past  B-IB 
configuration. 

In  the  case  of  viscous  flows,  stringent  resolution 
requirements  in  the  direction  normal  to  a  solid  body  force 
an  unstructured  tetrahedral  grid  to  have  similar  resolutions 
on  the  body  surface  in  order  to  maintain  acceptable  shapes 
for  the  conservation  cells  in  such  regions.  This  results  in 
an  unstructured  grid  with  too  many  conservation  cells  in 
the  vicinity  of  a  solid  body.  Such  a  restriction  does  not 
exist  in  the  case  of  a  structured  grid  and  thus  makes  it  more 
suitable  for  viscous  flows. 

Arguably  a  hybrid  grid,  consisting  of  a  structured  grid  in 
the  viscous  regions  and  unstructured  grid  elsewhere  may  be 
the  most  suitable  way  to  discretize  a  computational 
domain.  But,  considering  the  fact  that  one  of  the  main 
reasons  for  resorting  to  an  unstructured  grid  is  the  difficulty 
involved  in  generating  body-conforming  structured  grids 
for  complex  geometries,  it  is  indeed  questionable  whether 
much  could  be  gained  from  such  a  strategy.  At  the  Rockwell 
Science  Center  we  have  been  experimenting  with  a  hybrid 
unstructured  grid  consisting  of  triangular  prismatic  cells  in 
the  viscous  regions  and  tetrahedral  cells  elsewhere.  This 
approach  seems  to  be  promising. 

One  aspect  of  unstructured-grid  solvers  in  which  real 
progress  has  been  made  is  the  storage  requirement.  Whereas 
the  structured  grid  solvers  require  only  about  30  words  of 
storage  per  conservation  cell,  the  unstructured  grid  solvers 


used  to  demand  about  200.  This  situation  has  been  vastly 
improved,  and  the  storage  requirement  has  been  brought 
down  to  a  manageable  60  words  per  conservation  cell. 

The  concept  of  “interior”  boundary  conditions  described  in 
section  4.1  is  very  promising.  It  was  used  successfully  in 
computing  the  trajectory  of  a  store  released  from  an  F-18. 
This  concept  also  proved  its  usefulness  in  analyzing  the 
effect  of  mounting  an  additional  equipment  on  an  aircraft. 
In  this  case,  the  grid  and  solution  from  an  earlier 
computation  could  be  used  along  with  the  geometry  of  the 
added  equipment  to  obtain  the  required  information  in  a 
timely  manner.  The  present  implementation  of  this 
concept  has  some  shortcomings.  To  minimize  the  number 
of  arithmatic  operations,  several  approximations  were 
introduced.  Instead  of  computing  the  exact  contribution  of 
each  face  for  updating  the  interior  boundary  points,  some 
simple  recipes  were  employed.  This  results  in 
communication  between  the  cells  that  lie  on  either  side  of  a 
solid  object.  That  is,  the  interior  boundary  point  pairs  1 
and  2  in  Fig.  3.  interact,  resulting  in  an  erroneous 
interaction  between  the  inside  and  outside  of  the  body.  It 
appears  that  shortcuts  may  not  work,  and  it  may  be 
necessary  to  consider  the  exact  geometry  of  the 
intersecting  surfaces  when  interior  boundary  conditions  are 
encountered.  Since  this  process  is  very  involved,  it  may 
not  be  acceptable  for  many  problems.  Alternative 
solutions  are  currently  being  investigated. 

One  burden  that  we  have  carried  over  from  structured-grid 
algorithms  to  unstructured  grid  is  the  use  of  locally  one¬ 
dimensional  approximate  Riemann  solvers.  This  conscious 
introduction  of  a  known  problem  is  due  to  lack  of  a  better 
alternative.  Several  multidimensional  Riemann  solvers 
have  been  considered  in  the  structured-grid  world  without 
much  success.  This  problem  is  accentuated  in  the  case  of 
unstructured  grids  due  to  the  difficulty  in  controlling  cell 
shapes.  Of  course,  when  sufficiently  fine  grids  are 
employed,  Riemann  solvers  do  not  play  a  major  role.  But, 
this  doesn’t  happen  in  the  real  world  and  hence,  at  least  for 
now,  we  do  have  to  reckon  with  errors  that  arise  from 
locally  one-dimensional  approximate  Riemann  solvers. 


9.  CONCLUSIONS 

The  most  important  conclusion  that  we  have  arrived  at  from 
our  experiments  with  unstructured  grid  computations  is  that 
they  offer  a  chance  to  prove  to  the  designers  that  CFD  is  a 
tool  not  just  for  analysis,  and  that  it  may  very  well  be  a 
better  alternative  to  existing  design  tools  such  as  panel 
methods.  Structured  grid  solver  will  still  play  a  major  role 
as  an  analysis  tool  and  as  a  tool  for  understanding  some 
complex  flow  features. 
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1.  SUMMARY 

In  this  paper,  recent  advances  in  the  development  of  a  new 
quadratic  reconstruction  finite-volume  scheme  for  unstruc¬ 
tured  polygonal  meshes  are  presented.  The  scheme  is  used 
to  discretize  the  two-dimensional  compressible  Euler  and  full 
Navier-Stokes  equations.  The  quadratic  reconstruction  is 
shown  to  lead  to  a  full  second-order  accurate  discretization  of 
the  advective  derivatives.  The  accuracy  of  the  scheme  is  very 
weakly  dependent  on  grid  distortions,  a  property  which  is  very 
attractive  for  adaptive  unstructured  grids  computations.  The 
pseudo-time  integration  of  the  equations  is  performed  hy  an 
implicit  scheme  based  on  Newton-Krylov  techniques.  The  li¬ 
near  system  that  arises  from  the  Newton  linearization  is  solved 
by  the  GMRES  algorithm.  The  incomplete  LU  factorization 
is  employed  for  the  system  preconditionning.  The  accuracy, 
efficiency  and  robustness  of  the  method  are  demonstrated  on 
various  classical  test  cases  re.spectively  corresponding  to  in- 
viscid  and  viscous  laminar  flows. 

2.  LIST  OF  SYMBOLS 

s  conservative  variables 

u,  Q  any  flow  variable 

f  ,g  advective  fluxes 

F,G  viscous  fluxes 

rix  ,ny  edge  normal  components 

Q,  area  of  control  volume 

dQ  contour  of  control  volume 

6k  length  of  edge  k 

Ti  Hessian  matrix 

O  cell  gravity  center 

r  position  vector 

r  volume  surrounding  a  cell  and 

bounded  by  its  neighbors 
AiO  Oi  —  Oo 

QUA  quadratic  reconstruction 

LIN  linear  reconstruction 

CON  constant  reconstruction 


‘Research  Assistant 
iF.R.I.A.  Research  Assistant 
t  Professor 

^F.N.R.S.  Research  Assistant 


ROE  Roe’s  flux  difference  splitting 
VL  Van  Leer’s  flux  vector  splitting 

a  discontinuity  detector 

Af  time  step 

h  local  characteristic  mesh  size 

3.  INTRODUCTION 

During  this  last  decade,  many  investigations  have  been  carried 
out  to  develop  efficient  numerical  techniques  for  solving  the 
compressible  Navier-Stokes  equations  for  complex  geometries. 
Unstructured  meshes  turn  out  to  be  a  useful  tool  to  generate 
grids  around  general  configurations,  and  offer  the  powerful 
capability  of  adaptation  to  local  flow  features.  Nevertheless, 
the  inherent  distortions  present  in  unstructured  grids  cause  the 
classical  schemes  used  for  structured  meshes  to  be  mostly  in¬ 
efficient  for  computing  accurate  solutions. 

In  1990,  Barth  and  Frederickson  '  proposed  the  concept  of 
high-order  polynomial  reconstruction  also  named  k-exact  re¬ 
construction  schemes.  Depending  on  a  discrete  set  of  cell  va¬ 
lues,  a  high-order  cell-by-cell  reconstruction  of  the  flow  vari¬ 
ables  is  performed.  A  high-order  Gauss  quadrature  coupled 
with  an  approximate  Riemann  solver  is  used  to  evaluate  the 
flux  balance  integrals.  This  approach  essentially  corresponds 
to  the  generalization  of  the  Godunov  method  to  high-order 
schemes  on  any  type  of  meshes.  It  was  initially  designed  for 
Essentially  Non  Oscillatory  schemes  However,  the  appli¬ 
cation  of  the  latter  to  steady  state  computations  proved  to  be 
difficult  due  to  the  large  computational  time  requirement  and  to 
some  convergence  problems  Barth  ^  developed  a  quadratic 
reconstruction  with  a  fixed  support  stencil  in  the  frame  of  a 
cell-vertex  finite-volume  scheme.  At  the  same  time,  Essers  et 
al.  *’’’  proposed  a  non  fully  conservative  finite-volume  scheme 
for  structured  meshes  which  also  preserved  quadratic  polyno¬ 
mials.  Despite  its  lack  of  conservativity,  the  accuracy  and  the 
robustness  of  the  method  were  demonstrated  by  the  computa¬ 
tions  of  viscous  flows  on  very  distorted  meshes. 

In  the  frame  of  the  present  research  we  contribute  to 
the  work  of  these  authors  by  developing  a  robust  and  accu¬ 
rate  scheme.  The  accuracy  of  the  scheme  is  determined  by 
the  use  of  an  original  quadratic  reconstruction  of  the  flow 
variables  and  a  second-order  Gauss  quadrature  within  a  cell- 
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centered  finite-volume  solver.  The  quadratic  reconstruction 
can  be  interpreted  as  a  higher-order  generalization  of  the  ro¬ 
bust  Green-Gauss  linear  reconstruction  widely  employed 
in  unstructured  solvers.  The  method  is  designed  to  deal  with 
grids  that  contain  general  polygonal  cells  with  an  arbitrary 
number  of  edges.  It  provides  a  high  flexibility.  Hybrid  grids 
which  are  of  high  interest  when  solving  viscous  flows  can  be 
used.  Adaptation  by  h-refinement  and  coarsening  is  employed. 
The  second-order  accuracy  of  the  scheme  with  respect  to  the 
discretized  advective  derivatives  can  be  demonstrated,  and  is 
achieved  regardless  of  the  amount  of  grid  distortions.  The 
viscous  term  is  discretized  by  using  a  central  approximation. 
The  monotonicity  of  the  solution  is  guaranteed  by  using  a 
discontinuity  detector  that  switches  the  scheme  to  a  constant 
reconstruction. 

For  large  problems,  convergence  rates  obtained  by  explicit 
methods  (Runge-Kutta),  even  with  some  acceleration  tech¬ 
niques  such  as  local  time-stepping  or  residual  averaging,  fi¬ 
nally  remain  insufficient.  The  speed-up  ol  the  convergence 
can  be  achieved  by  employing  a  multigrid  approach  or/and 
implicit  schemes  Venkatakrishnan  and  Barth  tested 

a  fully  implicit  scheme  for  unstructured  meshes,  wherein  the 
system  arisen  from  the  Newton’s  linearization  was  solved  by 
direct  methods.  However,  that  attempt  showed  that  direct 
methods,  despite  their  robustness,  are  plagued  by  extremely 
prohibitive  memory  and  computational  requirements.  As  an  al¬ 
ternative,  iterative  implicit  solvers  have  been  studied  by  many 
authors  I®-*’*.  The Newton-Krylov methods  have  turned  out 
to  be  really  successful  for  a  broad  class  ot  problems.  Within 
the  frame  of  the  Inexact  Newton  methods  iterative  solvers 
based  on  Krylov  subspace  generation  are  employed  to  approx¬ 
imately  solve  the  linear  system  that  arises  from  the  Newton 
linearization.  Among  others,  the  Generalized  Minimum  Resid¬ 
ual  (GMRES)  algorithm  of  Saad  and  Schultz  has  proved  to 
be  very  efficient  thanks  to  its  robustness.  We  employ  it  in  its 
finite-difference  version  that  has  the  major  advantage  not  to 
require  the  storage  of  the  system  Jacobian.  To  accelerate  the 
convergence  of  conjugate-gradient  like  algorithms,  precondi- 
tionning  is  highly  recommended  for  clustering  the  eigenvalues 
of  the  matrix  For  that  purpose,  we  use  the  incomplete  LU 
factorization  with  no  fill  in. 

4.  FINITE  VOLUME  DISCRETIZATION 

Consider  a  finite-volume  discretization  of  the  Navier-Stokes 
equations  onto  a  set  of  polygonal  cells  whose  number  of  edges 
Ni  can  be  arbitrary: 

/  dts  dQ,i  +  [(f  +  F)n;c 

J  J  dVl  I 

-f  (g+G)nj/]d(50,:)  =  0  (1) 


Within  the  frame  of  the  cell-centered  variant  of  the  finite- 
volume  method,  we  associate  to  each  polygonal  cell  a  set  of 
conservative  variables  (s^ )  which  refer  to  the  unknowns  at  the 
cell  gravity  center  (node).  A  second-order  spatial  discretization 
of  the  time  derivatives  of  (1)  can  therefore  be  achieved  without 
requiring  any  artifice  such  as  for  example  the  mass  lumping 
implicitly  present  in  the  cell-vertex  technique.  The  Navier- 
Stokes  equations  reduce  to  the  following  semi-discretized  con- 


Fig.  1:  Control  volume,  quadrature  points 


servative  system  of  the  non-linear  equations: 

N, 


(is,- 


dt  Q,. 


where  :  = 

-J 

pt  _ 


(f«x-  +  sriy)  dSk 

{Fnl  +  Gn^JdS, 


(2) 


5.  ADVECTIVE  DERIVATIVES 

Obviously  the  accuracy  of  the  scheme  (2)  is  essentially  depen¬ 
dent  on  the  numerical  integration  of  the  non-linear  advective 
flux  along  the  mesh  edges.  Two  steps  follow: 


•  First,  a  reconstruction  phase  reconstructs  the  flow  vari¬ 
ables  in  the  cell  from  the  discrete  values  at  the  neigh¬ 
boring  cell  gravity  centers. 

•  Secondly,  a  high-order  Gauss  quadrature  integrates  the 
upwind  numerical  flux  computed  by  a  Riemann  Solver. 


5.1  Preliminary  note  on  the  order  of  accuracy 

Various  definitions  of  the  accuracy  of  a  scheme  exist  in  the 
CFD  community.  In  this  paper,  we  use  a  definition  which  is 
usual  in  the  finite-difference  community,  i.e  we  refer  to  the 
accuracy  obtained  in  the  evaluation  of  the  advective  and  dif¬ 
fusive  derivatives  appearing  in  the  equations.  Hence,  second- 
order  accuracy  on  first  order  derivatives  (like  for  the  advective 
part  of  the  Euler  equations)  means  that  the  error  on  these  first 
order  derivatives  for  any  sufficiently  smooth  function  u  should 
decrease  quadratically  when  the  mesh  is  refined  similarly  in 
all  .space  directions.  The  dominant  term  of  the  truncation  error 
is  therefore  proportional  to  third-order  derivatives  times  the 
square  of  a  local  characteristic  mesh  size: 

“exact  - 

with  K  a  constant. 

A  scheme  discretizing  advective  derivatives  is  considered  as 
second-order  accurate  if  it  leads  to  an  exact  evaluation  of  the 
latter  when  the  corresponding  flux  vectors  are  any  quadratic 
(or  linear)  function  of  the  cartesian  coordinates  of  the  physical 
domain.  Indeed,  in  that  case,  the  third  and  higher  order  deriva¬ 
tives  appearing  in  the  truncation  error  (3)  obviously  vanish. 

In  the  literature,  the  most  frequently  employed  cell  reconstruc¬ 
tion  uses  a  representation  of  the  flow  variables  based  on  linear 
polynomials.  Clearly,  it  can  only  evaluate  exactly  a  linear 
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function,  but  not  a  quadratic  function.  This  means  that  the 
dominant  term  of  the  truncation  error  involves  a  second  order 
derivative  and  can  be  written: 

^  (4) 

with  K'  ^  0  a.  constant. 

The  resulting  scheme  is  therefore  first-order  accurate  only. 
However,  when  the  mesh  is  sufficiently  regular,  it  can  be 
demonstrated  that  K'  is  equal  to  0,  and  the  second-order  ac¬ 
curacy  is  recovered.  Similarly,  for  irregular  meshes,  the  domi¬ 
nant  error  on  the  advective  derivative  computed  by  a  constant 
reconstruction  involves  a  first  order  derivative  with  a  coeffi¬ 
cient  that  does  not  tend  to  zero  with  the  mesh  size.  For  these 
meshes,  the  constant  reconstruction  is  thus  inconsistent,  but 
consistency  (i.e.  first-order  accuracy)  is  recovered  on  regular 
grids. 

By  extending  the  linear  reconstruction  to  a  quadratic  recon¬ 
struction  defined  as  a  third-order  truncated  Taylor  series  expan¬ 
sion  of  the  variables  around  the  cell  gravity  center,  a  quadratic 
function  can  be  reconstructed  exactly  provided  that  the  numeri¬ 
cal  gradient  and  Hessian  matrix  are  respectively  computed  with 
a  second-order  and  a  first-order  accuracy  at  least : 

■Mrec(i’)  =  uo  +  Ar’^Vuo  +  ^Ar^TfoAr  (5) 

with  Ar  =  1'  —  ro,  and  u  is  any  flow  variable. 

Therefore,  if  the  numerical  integration  of  the  numerical  flux 
is  sufficiently  accurate,  a  second-order  accurate  scheme  is 
obtained  without  any  assumption  about  the  grid  regularity. 
This  property  is  quite  attractive  when  using  unstructured  grids 
which  are  usually  very  irregular. 

The  calculation  of  the  flux  (see  eq.  (2))  through  each  edge 
is  performed  by  a  high-order  numerical  integration  of  the  flux 
functions  using  the  n-points  Gauss  quadrature: 

n 

=  [f{xly^)4  +  g{x^,y^)n^^]  (6) 

j=i 


or  the  Van  Leer’s  flux  vector  .splitting  is  employed  to  compute 
the  upwind  numerical  flux  at  each  quadrature  point. 

5.2  Reconstruction  phase 

5.2.1  Fixed  support  stencil 

Contrary  to  the  Essentially  Non  Oscillatory  schemes,  the 
stencil  that  supports  the  reconstruction  is  fixed  during  the 
iterations.  The  major  advantage  is  an  important  saving  of  CPU 
time,  but  an  accuracy  deterioration  occurs  in  the  vicinity  of 
discontinuities.  Indeed,  to  satisfy  some  monotonicity  require¬ 
ments  in  shocks  or  other  regions  with  strong  flow  gradients, 
the  scheme  should  locally  reduce  to  a  constant  reconstruction 
whatever  the  methodology  employed  ;  TVD  ,  LED  ,  or 
hybrid  schemes 


5.2.2  A  classical  linear  reconstruction 
By  dropping  off  the  quadratic  terms  of  (5),  a  linear  reconstruc¬ 
tion  is  obtained  which  in  fact  corre.sponds  to  the  extension  of 
the  classical  finite-difference  Fromm  scheme  to  multiple  di¬ 
mensions: 

'Wrec(r)  =  Uo  +  Ax^Vuq  (7) 

The  gradient  at  the  cell  gravity  center  is  computed  by  the  well- 
known  robust  Green-Gauss  reconstruction  widely  employed 
in  unstructured  grid  solvers  ^’'^.The  Green-Gauss  theorem  is 
applied  to  compute  the  averaged  gradient  of  v.  over  the  surface 
of  a  bounding  control  volume: 

Vmo  —  ~  /  uisddT  (8) 

r  JdT 

where  F  denotes  a  bounding  volume  surrounding  fi  and  is 
defined  by  the  neighbors  of  f2  (fig.  1).  The  integral  in  (8)  is 
discretized  by  a  summation  of  the  contributions  of  the  linear 
segments  of  SF  obtained  from  the  trapezoidal  rule.  It  leads  to 
a  linear  combination  of  the  values  of  the  neighboring  nodes: 

Vuo  =  DiAu  (9) 


with:  Au  = 


Ui  -  uo 


UNn  -  Uo 


where  [x^,y^)  are  the  coordinates  of  the  Gauss  quadrature 
point  j,  U!j  denotes  the  weight  associated  with  this  point. 

By  using  n  quadrature  points,  the  formula  (6)  allows  the  ex¬ 
act  integration  of  polynomials  with  degree  2n  —  1  at  most. 
To  meet  the  second-order  accuracy  requirement  described  in 
section  5.1,  two  quadrature  points  are  at  least  needed  in  order 
to  compute  exactly  the  flux  integral  of  a  quadratic  polyno¬ 
mial  of  the  cartesian  coordinates.  Schemes  employing  linear 
reconstruction  only  necessitate  on£  quadrature  point  (located 
at  the  mid-point  edge),  but  they  are  usually  first-order  accu¬ 
rate  only  as  already  mentioned  above.  Essers  et  al.  ’’  however 
proved  that  a  one  quadrature  point  integration  can  produce  a 
full  second-order  scheme  even  for  very  irregular  meshes.  This 
accuracy  can  only  be  recovered  by  applying  a  non  conserva¬ 
tive  correction  to  the  scheme,  which  definitely  constitutes  a 
drawback  with  respect  to  the  present  method. 


No,  is  the  number  of  neighbors  of  fl,  i.e.  the  cells  connected 
to  Q  by  at  least  a  common  edge  or  a  common  vertex.  D  i  is 
a  2  X  No  matrix  with  constant  coefficients. 


5.2.3  Extension  of  the  linear  to  the  quadratic  reconstruction 
By  using  a  Taylor  series  expansion  of  u  around  node  O  in 
(9),  the  truncation  error  E  corresponding  to  formula  (9)  can 
be  expressed  as; 


with:  V^Uo  — 


E  =  Er  V  uq 
dl,.uo 


(10) 


''xx 

Cxy 


Uo 

Uo 


Note  that  Er  is  a  2  X  3  matrix  containing  constant  coeffi¬ 
cients  of  0{h).  For  arbitrary  meshes,  second-order  accuracy 
is  nevertheless  recovered  by  subtracting  E  from  the  right-hand- 
.side  of  (9): 


A  Riemann  solver  such  as  the  Roe’s  flux  difference  splitting 


Vuo  =  Di  Au  —  Er 


(11) 


9-4 


This  second-order  numerical  gradient  does  indeed  depend  on 
some  sufficiently  accurate  (first-order  at  least)  second-order 
derivatives.  By  replacing  (11)  in  (5),  we  obtain  a  quadratic 
reconstruction  for  which  the  only  unavailable  coefficients  are 
the  second-order  derivatives  of  u: 

linear  part 

/'  "  s 

Urec{l')  =  Uo  +Ar^Di  Au 

+  1^— Ar^Er-|-  A»Ay]  j 

' - - - — - ^ 

quadratic  part 

The  second  order  derivatives  are  computed  by  a  technique 
sometimes  referred  to  as  the  minimum-energy  reconstruction 
It  simply  consists  in  fitting  the  cell  quadratic  polynomial 
Urec  to  the  values  of  the  neighboring  nodes.  The  following 
functional  is  minimized  with  respect  to 

Nu 

'^(Ureciri)  -Uif  (13) 

i^l 

which  is  equivalent  to  solve  in  the  least  square  sense  the  fol¬ 
lowing  linear  system  of  Nfi  equations  and  3  unknowns; 

(As- AiEr)V\o  =  (I- AiDi)Au  (14) 


with; 


A®! 

Aj/i  ' 

iAccj  ^Ay'f  AxiAyi 

Ai  = 

> 

to 

II 

AyNa. 

.iAx%^^Ay%^AxNnAyN^_ 

ways;  either  by  selecting  another  stencil  for  the  reconstruction 
which  does  not  involve  the  discontinuity  (ENO  schemes  ^^),  or 
by  modifying  the  reconstruction  within  the  same  stencil  (TVD 
schemes  ^^). 

The  design  of  multidimensional  limiters  has  been  introduced 
by  Barth  and  Jespersen  **.  However,  as  shown  by  Venkata- 
krishnan  such  limiters  may  severely  hamper  the  conver¬ 
gence  to  the  steady  state.  This  problem  is  still  more  dramatic 
when  employing  implicit  schemes  with  large  CFL  numbers. 
Venkatakrishnan  proposed  some  modifications  to  the  limiter, 
and  obtained  convergence  at  the  price  of  the  evaluation  of  an 
additional  constant. 

We  employ  another  approach  by  using  the  rather  old  idea  of 
hybrid  schemes  ,  however  applied  to  the  reconstruction. 
The  quadratic  reconstruction  is  switched  to  a  monotone  con¬ 
stant  reconstruction  in  the  vicinity  of  discontinuities.  While  in 
“smooth  flows  regions”,  it  remains  unaltered. 

This  is  easily  achieved  with  the  formulation  (12),  but  requires 
a  discontinuity  detector,  that  is  taken  of  the  form; 

Nci 

y]  |Ari(ViU  -  Vot/)! 

ao  =  — - - - 

^  [(|Ar,Viu|  +  |AriVott.|)  -f  7(|wi|  +  |mo|)] 

(16) 

u  is  the  pressure  or/and  the  velocity  norm.  The  complete  form 
of  7,  which  acts  as  a  filter  term,  is  given  in  reference  ( ®). 


The  normal  equations  are  non  singular  provided  that  the  stencil 
contains  at  least  6  nodes.  Although  this  condition  is  generally 
fulfilled  for  interior  nodes,  it  may  not  be  for  boundary  nodes. 
In  that  case,  the  stencil  must  be  enlarged  by  incorporating  the 
neighbors  of  the  neighbors  sharing  an  edge  with  the  concerned 
node.  The  solution  of  (14)  corresponds  to  a  first-order  approxi¬ 
mation  of  the  second  order  derivatives  which  can  be  expressed 
as  a  linear  combination  of  the  nodal  values; 

=  D2  Au  (15) 


Formula  (16)  is  an  extension  of  the  error  indicator  developed 
by  Lohner  for  transient  finite-element  computations. 

By  construction  (Tq  is  always  bounded  by  1,  and  provided  a 
threshold  value  a  discrete  discontinuity  detector  (Tq  can  be 
defined  at  each  node  O: 

if  (To  <  P 
if  (To  >  P  => 

where  /?  is  usually  chosen  close  to 
relatively  case  independent. 


(To  —  U 

0.2,  and  turns  out  to  be 


with  D2  a  3  X  TVn  matrix  with  constant  coefficients; 

D2  =  [(A2  -  AiEr)^  (A2  -  AiEr)] 

(A2- AiEi/  (I- AiDi) 

All  the  matrices  involved  in  the  reconstruction  Er,  Dj  and 
D2  are  preprocessed  and  stored. 

Note  that  variables  u  are  actually  the  primitives  variables.  As 
a  result,  the  gradients  of  the  primitives  variables  are  therefore 
directly  available  for  the  computations  of  the  viscous  fluxes. 
For  inviscid  flows,  the  conservative  variables  could  however 
be  used  as  well. 


The  quadratic  reconstruction  (5)  is  finally  modified  as  follows; 

Wrec(i')  =  uo  +  (To  [Ar^Vuo  +  iAr'^TfoAf]  (18) 

ao  is  computed  once  at  the  beginning  of  each  Newton’s 
iteration.  Unfortunately,  the  detector  is  sometimes  found  to 
fluctuate  at  some  nodes  usually  located  in  the  neighborhood 
of  the  discontinuities.  At  these  locations,  it  may  indeed  appear 
difficult  to  decide  whether  the  cells  involve  the  discontinuity 
or  not.  However,  after  a  sufficient  residual  decay,  the  iterative 
process  can  be  supposed  to  be  close  enough  to  the  solution, 
and  the  detector  is  frozen  everywhere  for  the  rest  of  the  con¬ 
vergence.  A  similar  but  more  complex  strategy,  which  has  not 
been  tested  in  this  paper  yet,  can  be  found  in  reference  (  ®). 


5.3  Moiiotonicity  of  the  reconstruction 

High-order  schemes  produce  oscillations  in  the  vicinity  of  dis¬ 
continuities.  That  problem  can  be  overcome  in  two  different 


6.  DIFFUSIVE  DERIVATIVES 

A  centered  discretization  is  used  to  discretize  the  viscous  terms 
of  the  Navier-Stokes  equations.  The  viscous  flux  is  estunated 
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at  one  quadrature  point  located  at  the  mid-edge,  which  requires 
the  evaluation  of  the  gradients  of  the  primitive  variables  at  the 
nodes.  As  pointed  out  in  section  5.2.3,  these  gradients  have 
been  previously  computed  with  a  second-order  accuracy  during 
the  quadratic  reconstruction  phase.  They  are  obtained  at  the 
quadrature  point  by  using  a  linear  interpolation  between  the  left 
(L)  and  the  right  (R)  neighbors  of  the  edge.  Strictly  speaking, 
that  procedure  is  only  valid  if  the  mid-edge  point  lies  on  the 
line  joining  the  left  and  right  neighbors.  If  it  does  not,  the 
following  modified  interpolation  formula  has  to  be  used: 

d^up  -  ad^up -j- (1  -  a)d^up  (19) 

+AxpQ(adi^up  +  (1  -  a)d^^UL) 
+AypQ(ad^j^up  +  (1  -  aJd^yUp) 

where  P  is  the  quadrature  point,  Q  the  projection  of  P  on 
LR,  and  a  = 

IIL/tr 

The  accuracy  of  that  discretization  is  restricted  to  first-order 
for  arbitrary  meshes,  but  remains  second-order  when  the  mesh 
is  sufficiently  regular.  That  accuracy  limitation  is  however 
not  too  restrictive  because  the  truncation  error  is  multiplied 
by  a  usually  small  factor  (i.e.  the  inverse  of  the  freestream 
Reynolds  number). 

7.  IMPLICIT  INTEGRATION  SCHEME 


7.1  Newton’s  method 

As  our  purpose  is  to  compute  steady  state  solutions  of  the 
Euler  and  Navier-Stokes  equations,  Newton’s  method  could 
be  directly  applied  to  the  steady  state  equations.  However, 
a  well-known  drawback  of  the  Newton’s  method  is  the  need 
for  a  sufficiently  “close  initial  guess’’  in  order  to  guarantee 
convergence.  A  common  approach  to  bypass  that  problem  is 
to  consider  the  unsteady  equations  and  to  march  in  time.  Be¬ 
cause  fully  implicit  schemes  are  known  to  be  unconditionally 
stable,  the  time  step  At  is  allowed  to  increase  and  finally 
tend  to  infinity  during  the  time-marching  in  order  to  permit 
quadratic  convergence  when  approaching  the  solution.  The 
time-marching  strategy  can  also  be  interpreted  as  the  addition 
of  an  extra  ^  term  to  the  diagonal  of  the  Jacobian  of  the 
steady  equations  in  order  to  increase  its  magnitude,  which  in 
fact  corresponds  to  an  under-relaxation  procedure. 

An  Euler-backward  time-stepping  is  employed  to  discretize 
the  time  derivative  of  the  equations  (1).  A  Newton-Raphson 
iterative  process  is  performed  at  each  time  step  /  to  find  the 
solution  of  the  following  system  of  non-linear  equations: 

^  ^  At  ^  ® 

The  operator  71{Q)  corresponds  to  the  discretization  of  the 
.spatial  derivatives  of  the  Euler  and  Navier-Stokes  equations 
described  in  the  previous  sections.  Finally,  the  whole  iterative 
process  can  be  summarized  in  two  loops: 

For  I  =  0,1,...  until  convergence  do: 

Set  g<:°)  =  s' 

{For  n  =  0, 1 , . . .  until  convergence  do: 

Solve 

Set  +  (5g(") 

Update  s'+i  = 


where  J (Q)  =  is  the  Jacobian  of  T . 

As  pointed  out  by  Kuffer  deciding  when  the  Newton  loop 
has  to  be  stopped  is  not  easy.  A  large  residual  decrease  is  not 
always  required,  which  necessitates  many  inner  iterations  and 
then  costs  a  lot  of  computational  time.  Except  for  unsteady 
flow  computations  for  which  equation  (20)  must  be  solved 
accurately,  many  authors  usually  limit  the  number  of  inner 
iterations  to  one  {n  —  0).  The  resulting  descent  direction  is 
in  fact  usually  accurate  enough  to  decrease  the  residual  satis¬ 
factorily.  As  the  time  step  increases  to  infinity,  the  iterative 
time-marching  scheme  tends  to  a  Newton-Raphson  lineariza¬ 
tion  of  the  steady  state  equations.  Restricted  to  one  inner  loop 
iteration,  the  iterative  process  (21)  becomes: 


For  /  =  0,  1, . . .  until  convergence  do: 

Solve  i7(s')^s  =  — 7?.(s') 

Update  =  s'  -f- 


7.2  Inexact  Newton’s  method 

Most  of  the  computational  time  required  by  a  Newton  algo¬ 
rithm  is  essentially  devoted  to  the  evaluation  of  the  Jacobian 

and  to  the  solution  of  a  linear  system.  The  exact  solution 
of  that  system  is  most  of  the  case  not  justified  when  the  iterate 
is  far  from  the  solution.  It  seems  to  be  quite  reasonable  to 
solve  it  approximately  by  using  an  iterative  solver,  which  of 
course  saves  a  lot  of  CPU  time  with  respect  to  direct  methods. 
Such  a  method  is  referred  to  as  an  inexact  Newton-method.  As 
shown  by  Dembo  et  al.  the  residual  on  the  linear  system 
must  however  verify  the  following  rule  in  order  to  preserve 
the  quadratic  convergence: 

-JNU  <77 

(22) 

rin  =  min{c||  J'(g(”))|f ,  ^}  with  0  <  2:  <  1 

where  r„  =  ||.||  denotes  any 

arbitrary  norm  in  R" ,  c  is  a  constant  and  !F  is  supposed  to  be 
suitably  scaled.  In  our  code,  we  use  c  =  0.5  and  2;  =  0.5. 

The  choice  of  the  iterative  solver  is  obviously  essential.  These 
solvers  can  be  grouped  in  two  sets:  stationary  iterative  meth¬ 
ods  (Jacobi,  Gauss-Seidel,  SOR, ...  etc),  and  the  non-stationary 
iterative  methods  (Krylov  subspace  algorithms,  Chebychev 
iteration,  ...  etc).  This  last  set  differs  from  stationary  meth¬ 
ods  in  that  the  computations  involve  information  that  change 
at  each  iteration.  Iterative  solvers  such  as  Jacobi  and  Gauss- 
Seidel  are  attractive  because  they  are  simple  and  easy  to  vec¬ 
torize.  But  their  main  drawback  is  that  their  robustness  and 
convergence  are  only  ensured  when  the  matrix  exhibits  large 
diagonal  terms.  Unfortunately,  due  to  the  stiff  problems  gener¬ 
ated  by  the  Euler  and  Navier-Stokes  equations,  a  large  diagonal 
for  the  system  Jacobian  requires  a  important  under-relaxation, 
which  indeed  destroys  the  Newton’s  quadratic  convergence. 
On  the  contrary,  Krylov  subspace  methods  such  as  the  con¬ 
jugate  gradient  are  known  to  solve  complex  problems  in  a 
finite  number  of  iterations.  Many  algorithms  derived  from  the 
conjugate  gradient  have  been  developed  to  deal  with  a  broad 
range  of  problems.  Among  others,  the  Generalized  Minimum 
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Residual  (GMRES)  of  Saad  and  Schultz  is  designed  to  solve 
non-symmetric  linear  systems. 

7.3  The  finite-difference  GMRES  algorithm 

The  GMRES  is  a  projection  method  for  solving  a  linear  system 

Ax  =  b  (23) 

that  seeks  an  approximate  solution  Xm  from  an  affine  subspace 
of  dimension  m  by  imposing  the  Petrov-Galerkin 

condition: 

b  -  Axm  -L  AlCm  (24) 

xq  represents  an  initial  guess  to  the  solution.  JCm  is  the  Krylov 
subspace  of  dimension  m: 

ICm{A,ro)  =  spanjro,  Aro,  A'^ro, .  ■  .,A^~^ro}  (25) 
with  Vo  —  b  —  Axo- 

In  other  words,  the  GMRES  process  successively  build  an 
approximate  solution  x^  at  each  iteration  m  so  that  Xm  is 
orthogonal  to  the  previous  search  directions  in  the  metric  of  the 
matrix  A,  and  minimizes  the  residual  Vm  in  L2  norm.  More 
details  about  the  implementation  can  be  found  in  reference 
In  our  code,  a  block  variant  of  the  basic  algorithm  with  restart 
is  used.  The  whole  problem  is  indeed  considered  as  a  n  X  n 
system  of  4  X  4  block  matrices. 

Typically,  a  Krylov  solver  such  as  GMRES  does  not  require 
the  calculation  of  the  Jacobian  J (Q)  but  only  necessitates  the 
computation  of  a  matrix  vector  product: 

J{Q)P  (26) 

where  p  denotes  any  vector. 

For  nonlinear  equations,  this  action  can  be  approximated  by  a 
finite-difference  quotient  of  the  form: 

(27) 

An  analysis  of  the  convergence  of  what  is  referred  to  as  the  in¬ 
exact  Newton/finite-difference  projection  methods  is  given  by 
Brown  .  The  interesting  feature  of  equation  (27)  is  that  the 
calculation  and  the  storage  of  the  Jacobian  are  not  required. 
Indeed,  the  computation  of  the  jacobians  of  the  advective  and 
diffusive  flux  may  be  very  complicated,  and  the  exact  Jacobian 
of  the  Roe’s  flux  difference  splitting  is  very  expensive  to  com¬ 
pute.  Furthermore,  the  introduction  of  turbulence  modelling  in 
the  frame  of  future  developments  will  also  lead  to  difficulties 
for  deriving  jacobians.  The  stencil  of  the  quadratic  reconstruc¬ 
tion  usually  involves  an  average  of  9  to  13  cells.  Therefore, 
the  required  storage  should  amount  from  144  to  208  words  per 
cell,  which  is  quite  expensive. 

A  proper  choice  of  the  parameter  e  in  (27)  is  given  by  the 
analysis  of  Dennis  and  Schnabel  : 

e\\p\\  =  ^  (28) 

where  t]  is  the  machine  zero  and  1 1 . 1 1  represents  the  RMS  norm. 


7.4  Precondltioimlng 

The  convergence  of  Krylov  solvers  is  very  dependent  on  the 
eigenvalues  of  the  matrix.  To  accelerate  the  convergence,  the 
use  of  a  preconditionner  that  clusters  the  eigenvalues  to  each 
other  is  strongly  recommended.  The  preconditionner  should  be 
as  close  as  possible  to  the  inverse  of  the  matrix.  In  practice, 
it  should  allow  a  fast  linear  system  resolution.  Precondition- 
ners  based  on  stationary  methods  (diagonal  preconditionner 
or  Jacobi,  Gauss  Seidel,  SOR)  have  been  widely  employed. 
A  comparison  of  different  preconditionning  techniques  can  be 
found  in  Orkwis  et  al.  and  Venkatakrishnan  et  al.  We  use 
the  incomplete  LU  decomposition  (ILU)  which  has  been 
demonstrated  to  be  a  very  efficient  preconditionning  strategy. 
It  is  generally  employed  in  its  simpler  version  named  I LU{0) 
for  which  no  fill  in  is  permitted  during  the  LU  decomposition. 
In  other  words,  the  non  zero  elements  of  the  preconditionner 
are  located  at  the  same  location  as  those  of  the  initial  matrix. 
This  has  the  advantage  of  a  fixed  and  minimum  memory  re¬ 
quirement.  However,  this  decomposition  can  turn  out  to  be 
too  weak  for  stiff  problems.  We  have  developed  and  actually 
use  a  block  version  of  the  ILU (0). 

The  choice  between  right  or  left  preconditionning  is  of  im¬ 
portance.  The  use  of  right  preconditionning  is  beneficial.  In¬ 
deed,  when  using  left  preconditionning,  all  the  residual  vectors 
and  their  norms  correspond  to  preconditionned  and  thus  scaled 
residuals.  Hence,  it  could  be  difficult  to  know  whether  the 
algorithm  needs  to  be  stopped.  On  the  contrary,  right  precon¬ 
ditionning  allows  the  use  of  the  actual  residuals. 

Although  the  Jacobian  matrix  does  not  need  to  be  formed  in 
the  GMRES  algorithm,  an  approximate  form  of  it  is  however 
still  required  for  the  preconditionning.  The  support  stencil  of 
the  quadratic  reconstruction  is  large,  it  is  therefore  prohibitive 
to  take  all  the  neighbors  into  account.  As  suggested  by  many 
authors,  an  approximate  Jacobian  may  be  computed  by  using 
a  constant  reconstruction  which  only  depends  on  the  edge- 
neighbors  or  distance-one  neighbors.  In  order  to  minimize  the 
bandwith  and  thus  the  fill-in  of  the  decomposition,  a  reverse 
Cuthill-McKee  ordering  is  performed  in  a  preprocessing  step  . 

In  most  of  the  results  presented  in  this  paper,  the  Roe’s  flux 
difference  splitting  is  employed.  It  is  quite  a  complex  and 
expensive  task  to  derive  analytically  even  an  approximate  form 
of  the  Jacobian  of  the  latter  scheme.  One  alternative  is  to  use 
the  easily  available  Jacobian  of  the  Van  Leer’s  scheme  in  the 
preconditionner,  which  costs  2  to  3  times  less  computational 
time  than  the  Roe’s  scheme  Jacobian.  A  comparison  between 
both  preconditonners  is  addressed  in  the  section  devoted  to  the 
presentation  of  the  results. 

It  should  be  mentioned  that  up  to  now  the  contribution  of  the 
viscous  flux  Jacobian  is  not  introduced  in  the  preconditionner. 

7.5  Time  step  increment  control 

As  explained  in  the  previous  section,  the  Newton’s  method  is 
implemented  in  a  time-stepping  form.  The  evolution  in  time 
is  monitored  by  the  time  step.  During  the  time-marching,  the 
time  step  is  increased  to  infinity  in  order  to  ultimately  achieve 
the  Newton’ s  quadratic  convergence.  Like  many  authors,  this 
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is  performed  by  employing  an  empirical  formula  in  which  the 
CFL  number  varies  according  to  the  inverse  of  a  residual 
norm: 


CFL'+^  =  CFLo 


Vl|7^(s')ll 
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There  indeed  subsists  two  different  parameters  to  tune  in  order 
to  optimize  the  convergence  rate  :  the  initial  CFL  number 
and  the  exponent  p.  Typical  values  of  the  latter  parameters 
are  :  CFLq  =  10  and  p  =  0.5. 


8.  BOUNDARY  CONDITIONS 

The  treatment  of  the  boundary  conditions  has  a  strong  influ¬ 
ence  on  the  convergence  of  an  implicit  scheme.  For  inviscid 
flow  computations,  we  use  a  very  convenient  procedure,  wich 
consists  in  imposing  the  boundary  conditions  in  a  weak  manner 
via  the  modification  of  the  advective  flux  through  the  bound¬ 
ary  edges.  Hence,  according  to  the  boundary  type,  some  of 
the  flow  variables  are  imposed  at  the  quadrature  points  of  the 
edges,  and  others  are  computed  from  their  values  at  interior 
nodes  using  extrapolation  formulas  similar  to  those  used  to 
evaluate  left  and  right  values  at  the  quadrature  points  of  in¬ 
ner  edges.  For  viscous  flow  computations,  inlet  and  outlet 
boundaries  are  treated  in  a  similar  way  as  inviscid  boundary 
conditions.  At  the  solid  walls,  the  viscous  flux  is  modified  in 
order  to  impose  the  noslip  boundary  condition: 
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number  of  points  involved  in  the  mesh.  The  edge  data  struc¬ 
ture  employed  in  the  code  and  the  relatively  insensitivity  of 
the  accuracy  of  the  numerical  scheme  to  grid  distortions  al¬ 
low  the  use  of  very  general  polygonal  cells,  and  as  a  result 
of  somewhat  distorted  meshes.  We  developed  a  very  general 
adaptation  strategy  based  on  mesh  enrichment  and  coarsening. 
The  method  is  based  on  an  error  indicator  of  the  form  (16). 
Cells  whose  error  indicator  lies  above  a  preset  threshold  are 
candidates  for  refinement,  while  others  whose  error  indicator 
lies  under  another  preset  value  are  to  be  possibly  coarsened. 
The  refinement  strategy,  which  is  implemented  for  any  type 
of  polygons  is  described  in  reference  (  *).  In  particular,  trian¬ 
gles  and  quadrangles  can  be  divided  anisotropically  depending 
on  the  value  of  an  anisotropy  sensor  based  on  some  standard 
deviations  of  the  gradients  of  a  flow  parameter  computed  in 
the  directions  pointing  to  the  different  neighbors  of  the  cell. 
Two  types  of  coarsening  procedures  are  considered.  The  first 
one  is  based  on  the  refinement  history.  A  tree  containing  the 
information  between  successive  meshes  is  updated  during  the 
refinements.  It  is  then  rather  easy  to  delete  “son”  cells  and  to 
recover  the  “parent”.  The  second  method  is  more  general  and 
coarsens  the  grid  by  deleting  vertices  and  recombining  others 
to  build  larger  polygons. 

10.  RESULTS 

10.1  Subsonic  sine-bump 

The  effect  of  the  various  reconstructions  {quadratic  -  linear 
-  constant)  has  first  been  tested  by  computing  the  inviscid 
subsonic  flow  (Mqo  =  0.5)  in  a  channel  perturbed  by  a  sine 
bump  with  a  mesh  of  1294  cells  (fig.  2a).  The  geometry  is 
defined  as  follows: 


That  method  however  turns  out  to  be  generally  too  weak  to  cor¬ 
rectly  satisfy  the  no  slip  condition.  Two  additional  procedures 
have  been  tested.  The  first  corresponds  to  the  introduction 
of  dummy  nodes  in  the  stencil  of  the  boundary  cells.  These 
dummy  nodes  are  located  at  the  mid-point  of  boundary  edges. 
The  flow  variables  at  these  nodes  are  extrapolated  or  imposed 
by  the  no  slip  boundary  condition  before  each  evaluation  of 
the  flow  derivatives.  That  method  has  been  implemented  in  a 
fully  implicit  manner  and  successfully  used  for  the  flat  plate 
boundary  layer  computation.  Unfortunately,  the  result  is  not  so 
good  for  more  complex  flow  computations.  For  these  flows, 
we  have  tested  another  procedure.  The  boundary  nodes  are 
no  longer  located  at  the  cell  gravity  center,  but  at  the  mid¬ 
boundary  edge,  and  the  noslip  boundary  condition  is  imposed 
in  its  strong  form  at  each  Newton  iteration. 

Finally,  notice  that  it  is  essential  to  include  the  contribution 
of  the  boundary  conditions  in  the  preconditionner.  The  Ja¬ 
cobian  of  the  modified  boundary  advective  flux  is  calculated 
analytically  for  most  of  the  boundary  conditions  except  for 
the  subsonic  inlet.  For  the  latter,  it  is  derived  from  a  finite 
difference  formula  similar  to  equation  (27). 


Lower  wall: 

—0.7  <  X  <  0  :  y  =  0 

0  <  X  <  1  :  y  =  0.05[1  -b  sin{2'KX  —  ^)] 

1  <  X  <  1.7  ^  :  y  =  0 

Upper  wall:  —0.7  <  a;  <  1.7  :  y  =  0.7 

The  Roe’s  scheme  is  employed  as  Riemann  solver.  The  solu¬ 
tions  have  been  computed  for  an  infinite  value  of  the  CFL 
number  and  a  maximum  number  of  GMRES  iterations  equal  to 
60  with  a  restart  every  30  iterations.  Figures  2c  and  2d  show 
the  evolution  of  the  Mach  number  and  the  total  pressure  on 
the  lower  wall.  The  quadratic  reconstruction  clearly  appears 
to  lead  to  the  lowest  spurious  entropy  generation  (fig.  2d). 
Hence,  it  predicts  the  highest  peak  Mach  number  :  0.835.  For 
the  sake  of  comparison,  the  peak  values  re.spectively  calculated 
with  the  linear  and  the  constant  reconstructions  are  equal  to 
0.804  and  0.754.  When  compared  to  other  reconstructions  (re¬ 
sults  not  shown),  the  symmetry  of  the  solution  obtained  with 
the  quadratic  scheme  is  almost  perfect  as  can  be  seen  from  the 
iso-mach  lines  pattern  (fig.  2b). 


9.  MESH  ADAPTATION 

The  possibility  of  using  a  flexible  local  grid  adaptation  pro¬ 
cedure  is  a  major  advantage  of  unstructured  meshes.  The 
objective  is  to  improve  the  resolution  of  the  flow  by  succes¬ 
sive  refinements  and  coarsenings  together  with  minimizing  the 


Fig.  2e  illustrates  the  dramatic  convergence  improvement  ob¬ 
tained  with  the  implicit  scheme  with  respect  to  an  explicit 
integration  using  a  3  steps  Runge-Kutta  algorithm.  The  pre¬ 
conditionner  based  on  the  approximate  Roe  flux  difference 
splitting  yields  the  fastest  convergence  in  terms  of  CPU  time 
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and  number  of  Newton  iterations  (fig.  2f).  The  quadratic  re¬ 
construction  however  takes  about  25  %  more  CPU  time  than 
the  linear  reconstruction  scheme  to  achieve  the  same  residual 
decay. 

10.2  Subcritical  NACA0012  airfoil 

The  second  test  case  again  illustrates  the  accuracy  gain  ob¬ 
tained  with  the  quadratic  reconstruction  scheme.  The  inviscid 
subsonic  flow  over  the  NACA0012  airfoil  has  been  computed 
at  a  freestream  Mach  number  of  0.63  and  an  incidence  of  2  deg. 
The  mesh  contains  4537  cells  (fig.  3a).  The  far-field  bound¬ 
ary  is  located  at  a  distance  of  20  chords  away  from  the  airfoil. 
The  starting  solution  corresponds  to  the  uniform  flow.  The 
solutions  computed  with  the  various  reconstructions  behave 
similarly  on  the  lower  wall  (fig.  3b).  However,  larger  discrep¬ 
ancies  occur  at  the  upper  wall  due  to  the  strong  flow  accelera¬ 
tion.  The  highest  peak  Mach  number  (0.981)  is  again  obtained 
with  the  quadratic  reconstruction  and  agrees  very  well  with  the 
value  computed  hy  Paillere  (0.983)  ^  on  the  same  mesh  with 
a  fluctuation  splitting  scheme.  Figure  3d  shows  the  evolution 
of  the  total  pressure  along  the  wall.  Notice  that  the  level  of 
spurious  entropy  generated  by  the  quadratic  reconstruction  is 
very  low.  The  lift  coefficient  Cl  —  0.323  compares  well 
with  the  value  computed  by  Paillere  (Cl  =  0.322),  and 
the  purely  numerical  pressure  drag  coefficient  is  found  very 
low,  Cd  =  0.00034.  The  lift  coefficient  is  however  slightly 
lower  than  the  exact  one  predicted  by  a  full  potential  method 
(Cl  =  0.334).  That  difference  can  be  explained  by  the  fact 
that  no  vortex  correction  is  imposed  at  the  far-field  boundary 
condition  The  error  between  the  present  value  and  the  ex¬ 
act  one  is  equal  to  3.4  %.  According  to  the  work  of  Thomas 
and  Salas  a  computation  with  a  mesh  of  about  20  chords 
and  with  no  vortex  correction  should  underpredict  the  lift  co¬ 
efficient  with  a  factor  of  4  %. 

The  influence  of  the  exponent  (p)  of  the  CFL  update  formula 
(29)  on  the  convergence  has  been  tested  (fig.  3e  and  3f).  The 
code  diverges  when  the  computation  is  initiated  with  an  infinite 
CFL  number.  The  convergence  history  obtained  when  the 
GMRES  is  replaced  by  an  SOR  iterative  solver  is  provided 
in  fig.  3f  and  3e.  Figure  3f  clearly  shows  that  the  Newton’s 
quadratic  convergence  is  never  reached  with  the  SOR  strategy. 
Nevertheless,  this  strategy  turns  out  to  be  competitive  in  terms 
of  the  computational  cost  (fig.  3e). 

10.3  Transonic  flow  over  a  circular  arc  bump 

To  test  the  accuracy  and  the  performance  of  the  scheme  to 
calculate  flows  with  shock  waves,  the  code  is  applied  to  a 
classical  test  case:  the  inviscid  transonic  flow  (Moo  —  0.85) 
over  a  circular  arc  bump  in  a  channel.  The  use  of  implicit 
Newton-Krylov  techniques  for  flows  with  discontinuities  re¬ 
mains  difficult  because  of  the  modifications  of  the  reconstruc¬ 
tion  that  are  required  to  preserve  the  monotonicity  of  the  so¬ 
lutions.  As  explained  in  section  5.3,  the  quadratic  reconstruc¬ 
tion  (18)  modified  by  the  discontinuity  detector  is  employed  to 
achieve  monotone  solutions.  For  this  test  case,  the  Van  Leer’s 
flux  vector  splitting  is  used.  The  computation  is  started  from  a 
solution  previously  computed  with  the  constant  reconstruction 
scheme.  Indeed,  one  of  the  major  reported  disadvantages  of 
implicit-Newton  methods  is  the  time  required  by  the  shocks  to 


migrate  to  their  right  location.  During  that  phase,  the  residual 
actually  stagnates.  That  prevents  the  CFL  number  to  increase 
to  infinity,  and  therefore  dramatically  slows  down  the  conver¬ 
gence.  We  actually  use  the  two  following  remedies:  the  star¬ 
ting  solution  is  obtained  with  a  cheap  low  order  scheme,  and 
we  use  a  grid  sequencing  strategy  with  mesh  adaptation.  The 
initial  mesh  contains  1420  rectangular  cells  (fig.  4a).  After 
three  adaptation,  the  final  grid  (fig.  4b)  involves  a  lower  num¬ 
ber  of  cells  (1296),  which  are  very  general  polygons.  The  total 
computational  time  (not  shown  here)  amounts  to  400  CPU  sec. 
on  a  HP9000/730  workstation  (infinite  CFL  number).  Fig. 
4c  shows  the  points  where  the  detector  automatically  activates. 
In  order  to  avoid  endless  switches  of  the  latter,  it  is  frozen  after 
5  Newton’s  iterations.  As  can  be  shown  of  fig.  4d  and  4e,  a 
very  crisp  shock  is  captured.  Different  convergence  histories 
for  computations  performed  on  the  initial  mesh  are  presented 
in  fig.  4g  and  4h.  The  fastest  convergence  is  again  obtained 
with  an  infinite  CFL  number.  Those  figures  also  show  that 
an  exponent]?  equal  to  2  yields  a  similar  convergence  history. 
For  the  sake  of  comparison,  the  GMRES  algorithm  appears  to 
be  about  3  times  faster  than  the  SOR  scheme  in  terms  of  the 
computational  time. 

10.4  Inviscid  hypersonic  flow  over  a  double-eliipse 

We  now  consider  the  inviscid  flow  over  the  double-ellipse  test 
case  proposed  in  the  workshop  of  Antibes  at  30  deg.  angle 
of  attack  and  a  Mach  number  of  8.15.  The  initial  mesh  of 
2412  triangles  (fig.  5a)  is  adapted  three  times  (9527  cells, 
fig.  5b).  The  iso-mach  lines  pattern  is  presented  in  fig.  5c. 
Nearby,  the  fig.  5d  shows  the  nodes  where  discontinuities  are 
automatically  detected.  Convergence  histories  are  presented 
for  the  computation  on  the  final  adapted  mesh.  Notice  in 
fig.  5e  the  dramatic  convergence  improvement  obtained  with 
implicit  scheme  with  respect  to  a  4  steps  explicit  Ruhge-Kutta 
scheme.  Fig.  5h,  5g  and  5i  respectively  give  the  evolution 
of  the  Mach  number,  the  pressure  coefficient  and  the  total 
pressure  along  the  windward  and  leeward  sides.  Our  results 
are  compared  with  those  obtained  by  Gustafsson  et  al.  and 
Khalfallah  et  al.  published  in  the  workshop  proceedings 
The  pressure  coefficient  and  the  Mach  number  agree  with  the 
results  of  the  latter  authors.  Notice  the  fair  agreement  between 
the  computed  total  pressure  and  the  exact  one  which  can  be 
obtained  from  the  normal  shock  theory  (less  than  0.02  %  error 
on  the  leeward  side). 

10.5  Supersonic  flow  around  a  NACA0012  airfoil 

The  supersonic  flow  over  the  NACA0012  airfoil  (Mach  =  1.2, 
angle  of  incidence  =  0  degree)  illustrates  the  flexibility  of  the 
adaptation  technique  and  the  preservation  of  the  accuracy  of 
the  scheme  even  on  very  distorted  grids.  The  calculation  is 
started  on  a  triangular  mesh  shown  in  fig.  6a.  Three  adapta¬ 
tions  are  performed.  The  final  mesh  is  made  of  polygons  with 
a  number  of  edges  varying  from  3  to  7  (fig.  6b  and  6c).  The 
detached  shock  and  the  oblique  shocks  attached  to  the  trailing 
edge  are  well  captured  (fig.  6d).  The  distribution  of  the  Mach 
number  on  the  upstream  and  downstream  parts  of  the  x-axis 
as  well  as  along  the  airfoil  is  presented  in  fig.  6e.  The  present 
calculation  is  computed  with  a  4  steps  explicit  Runge-Kutta 
scheme.  Up  to  now,  no  attempt  was  made  to  use  the  implicit 
scheme  for  this  test  case. 
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10.6  Laminar  viscous  flow  over  a  flat  plate 

The  accuracy  of  the  Navier-Stokes  code  has  been  assessed 
by  investigating  the  development  of  a  laminar  compressible 
boundary  layer  over  an  adiabatic  flat  plate.  For  that  calcula¬ 
tion,  the  Mach  and  the  Prandtl  numbers  are  respectively  taken 
equal  to  0.5  and  1.  The  viscosity  is  proportional  to  the  tem¬ 
perature  (Crocco’s  viscosity  law)  in  order  to  compare  our  re¬ 
sults  with  the  exact  solution  predicted  by  the  boundary  layer 
theory.  The  computation  is  performed  with  a  full  quadratic 
reconstruction  and  the  Roe’s  flux  difference  splitting.  The 
initial  CFL  number  is  equal  to  10  and  the  exponent  p  is 
0.5.  We  noticed  quite  an  important  numerical  influence  of  the 
downstream  boundary  condition  (inviscid  subsonic  outlet  with 
pressure  imposed),  that  forced  us  to  locate  that  boundary  quite 
far  from  the  leading  edge  of  the  plate,  i.e.  at  a  Reynolds  num¬ 
ber  based  on  x  equal  to  10,000.  The  mesh  contains  rectangular 
cells.  There  is  an  average  of  15  cells  in  the  displacement  thick¬ 
ness  of  the  boundary  layer.  An  excellent  agreement  is  found 
between  the  computed  and  exact  velocity  and  temperature  pro¬ 
files  (fig.  7c).  Fig.  7b  shows  the  evolution  of  the  skin  friction 
coefficient  along  the  plate,  which  also  agrees  very  well  with 
the  exact  one.  Notice  also  the  good  agreement,  especially  in 
outer  part  of  the  boundary  layer,  between  the  computed  and 
exact  shear  stress  (fig.  7d).  Unfortunately,  some  deviation  oc¬ 
curs  near  the  wall.  Our  latest  investigations  show  that  it  seems 
to  be  caused  by  a  perturbation  coming  from  the  downstream 
boundary  condition.  The  problem  must  be  further  studied.  The 
convergence  history  is  reported  in  fig.  7a,  The  relatively  slow 
convergence  is  attributed  on  one  hand  to  the  fact  that  no  con¬ 
tribution  of  the  viscous  terms  jacobians  is  introduced  in  the 
preconditionner  and  on  the  other  hand  to  the  weakness  of  the 
ILU(0)  decomposition. 

10.7  Laminar  viscous  flow  over  the  NACA0012  airfoii 

In  this  final  test  case,  we  consider  the  laminar  flow  over  a 
NACA0012  airfoil  at  0  deg.  incidence  with  a  freestream  Mach 
number  of  0.5  and  a  Reynolds  number  of  5000.  The  wall 
is  adiabatic.  The  Sutherland  viscosity  law  is  employed  and 
the  Prandtl  number  is  equal  to  0.72.  The  flexibility  of  the 
method  is  illustrated  by  employing  a  hybrid  grid  (fig.  8a). 
It  consists  in  a  structured  C-type  part  around  the  airfoil  and 
in  the  wake  surrounded  by  a  triangular  mesh.  The  far-field 
boundary  is  located  at  a  distance  of  33  chords  from  the  airfoil. 
The  cell  aspect  ratio  varies  from  100  near  the  wall  to  50,000 
in  the  wake.  The  iso-mach  lines  pattern  presented  in  fig.  8b 
shows  the  development  of  the  boundary  layer  and  its  separation 
near  the  trailing  edge  to  form  a  small  recirculation  bubble. 
The  pressure  and  skin  friction  coefficients  are  presented  in 
fig.  8c  and  8d.  Accuracy  estimates  of  the  results  may  be 
carried  out  by  comparing  the  location  of  the  separation  point 
(in  percents  of  the  chord)  and  the  magnitudes  of  the  pressure 
and  viscous  drag  coefficients.  We  obtain  x^ep  =  81.7%, 
Cdp  =  0.0227,  Cc?„  =  0.0320.  These  results  agree  with  the 
reference  values  obtained  by  Swanson  and  Turkel  on  a  518 
X  128  structured  mesh  (Xaep  =  81.4%,  Cdp  —  0.02235, 
Cdfj  —  0.03299).  Notice  however  that  the  present  mesh 
only  involves  7709  cells  and  is  relatively  coarse  in  the  leading 
edge  region  which  is  responsible  for  a  slightly  underprediction 
of  the  skin  friction.  We  obtain  a  maximum  peak  value  of 


0.143  instead  of  the  reference  value  0.15.  Moreover,  the  cell 
longitudinal  dimension  in  the  region  of  the  separation  point  is 
also  relatively  large:  about  1  %  of  the  chord. 

11.  CONCLUSION 

In  this  paper,  an  original  quadratic  reconstruction  finite-volume 
scheme  for  solving  the  Euler  and  full  Navier-Stokes  equa¬ 
tions  has  been  presented.  The  quadratic  reconstruction  is  a 
higher-order  extension  of  the  robust  Green-Gauss  linear  recon¬ 
struction.  The  accuracy  of  the  resulting  discretized  advective 
derivatives  is  second-order,  and  is  insensitive  to  grid  distor¬ 
tions.  The  robustness  and  the  high  accuracy  of  the  scheme 
have  been  demonstrated  by  various  computations  on  very  dis¬ 
torted  meshes.  The  Newton-Krylov  method  based  on  the 
GMRES  iterative  solver  has  been  successfully  used  to  drama¬ 
tically  improve  the  convergence  to  steady  state  with  respect 
to  explicit  methods.  The  implicit  scheme  has  been  tested  on 
fully  subsonic,  transonic,  supersonic  inviscid  flows,  and  on 
laminar  viscous  flows  computations.  For  transonic  and  su¬ 
personic  flows,  a  discrete  discontinuity  detector  is  employed 
to  switch  the  scheme  to  a  monotone  constant  reconstruction. 
This  alternative  does  not  encounter  the  major  problems  of  the 
classical  multidimensional  limiters  to  drive  the  convergence 
to  machine  accuracy.  For  inviscid  flow  test  cases  when  the 
Roe’s  flux  difference  splitting  is  employed,  the  precondition¬ 
ner  based  on  an  approximate  Jacobian  of  the  Roe’s  flux  dif¬ 
ference  splitting  always  lead  to  a  faster  convergence  than  a 
preconditionner  based  on  the  Van  Leer’s  flux  vector  splitting 
although  much  cheaper  to  compute.  For  viscous  flow  com¬ 
putations,  the  ability  of  the  scheme  to  deal  with  hybrid  grids 
is  a  real  advantage.  The  quadratic  reconstruction  has  led  to 
very  accurate  solutions.  However,  the  proper  imposition  of 
the  boundary  conditions  remains  a  problem.  Two  methods 
have  been  tested.  The  first  one  which  modifies  the  stencil 
of  the  reconstruction  for  boundary  cells  to  include  the  effect 
of  the  boundary  conditions  has  been  successfully  applied  for 
a  flat  plate  boundary  layer  computation.  But,  another  proce¬ 
dure  was  required  for  the  computation  of  a  laminar  viscous 
flow  around  the  NACA0012  airfoil.  It  consists  in  locating  the 
boundary  nodes  on  the  boundary  edges  rather  than  at  the  cell 
gravity  center  and  then  to  apply  the  boundary  conditions  in 
their  strong  form.  This  modified  strategy,  which  is  explicit, 
unfortunately  artificially  perturbs  the  convergence  for  nodes 
near  solid  walls.  More  efforts  should  also  be  devoted  to  the 
improvement  of  the  preconditionner  which  seems  to  be  too 
weak  for  viscous  flow  computations. 
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Abstract 

The  aim  of  this  contribution  is  to  present  the  first  nu¬ 
merical  results,  that  we  have  obtained  with  a  new  sec¬ 
ond  order  kinetic  theory  based  scheme.  The  main  inter¬ 
est  of  our  approach  is  that  density  and  internal  energy 
can  be  proved  to  remain  non  negative  under  a  CFL  like 
condition.  It  is  well  known  that  classical  approximate 
Riemann  solvers,  even  first  order  accurate,  do  not  sat¬ 
isfy  this  property.  Our  first  order  scheme  is  the  classical 
kinetic  scheme,  based  on  the  Meixwellian  velocity  distri¬ 
bution.  Our  second  order  extension  consists  of  adding  to 
the  first  order  numerical  flux  an  antidiffusive  correction 
which  has  to  be  limited  such  that  the  constraints  of  pos¬ 
itivity  will  be  satisfied.  It  can  be  seen  as  a  varieint  of  the 
so-called  corrected  anti-diffusive  flux  approach.  We  have 
performed  numerical  computations  for  various  two  and 
three  dimensionnal  test  cases  on  unstructured  and  self- 
adaptative  meshes,  in  order  to  evaluate  the  accuracy  and 
the  robustness  of  this  new  method.  Comparisons  have 
been  done  with  a  second  order  extension  of  Roe’s  scheme 
(with  MUSCL  approach). 

Introduction 

Le  but  de  cette  contribution  est  de  presenter  les  pre¬ 
miers  resultats  numeriques  obtenus  avec  un  nouveau 
schema  d’ordre  2,  base  sur  la  theorie  cinetique  des  gaz. 
Le  principal  interet  de  notre  approche  reside  dans  le  fait 
qu’on  peut  prouver  que  la  densite  et  la  pression  restent 
positives  sous  une  condtion  de  type  CFL.  II  est  bien 
connu  que  les  schemas  classiques,  construits  sur  la  base 
d’un  solveur  de  Riemann  approche,  ne  possedent  pas 
cette  propriete  meme  a  I’ordre  1  [3].  C’est  un  serieux 
inconvenient  lorsqu’on  souhaite  calculer  des  ecoulements 
pour  lesquels  la  densite  est  tres  faible  ou  pour  lesquels 
I’energie  interne  est  faible  devant  I’energie  cinetique 
(Ecoulements  hypersoniques,  Problemes  de  detonique, 
...).  Noter  de  plus  que  notre  approche  peut  se  generaliser 
sans  difflculte  aux  ecoulements  reactifs  [12,16]. 

Au  premier  ordre,  notre  schema  n’est  autre  que  le 
schema  cinetique  classique  base  sur  la  distribution  de 
vitesses  Maxwellienne  introduit  par  Pullin  dans  [19].  On 


peut  montrer  que  ce  schema  preserve  la  positivite  de  la 
densite  et  de  la  pression  sous  une  ccmdition  de  type  CFL 
[11].  Notre  extension  au  second  ordre  consiste  a  ajouter 
au  flux  numerique  du  premier  ordre  une  correction  anti¬ 
diffusive  qui  doit  etre  hmit^  de  sorte  que  les  positivites 
soient  pr&ervees.  Cette  approche  peut  etre  consideree 
comme  une  variants  de  la  methods  dite  ’des  flux  mod¬ 
ifies’. 

Les  maillages  utilises  sont  structures  ou  non  struc¬ 
tures;  de  plus  une  technique  de  raffinement  automa- 
tique  de  nmillage  a  ete  implantee  dans  les  codes 
2D  et  3D.  Nous  avons  realise  de  nombreuses  simula¬ 
tions  numeriques  sur  differents  types  de  maillages,  afin 
d’evaluer  la  robustesse  et  la  pr^ision  de  ce  nouveau 
schema.  Des  comparaisons  ont  egalement  ete  faites  avec 
le  schema  de  Roe  etendu  au  second  ordre  suivant  la 
methode  MUSCL  de  Van  Leer. 

Le  plan  de  Particle  est  le  suivant.  On  commence 
tout  d’abord  par  quelques  generalites  sur  les  schemas 
cinetiques  dont  on  rappelle  les  principales  proprietes. 
Dans  la  seconde  partie,  on  presente  le  principe  de  notre 
extension  au  second  ordre.  La  troisieme  partie  est  con- 
sacree  a  I’expose  d’un  critere  de  raffinement  de  maillage 
(base  sur  la  production  locale  d’entropie  du  schema)  et 
a  la  description  de  la  technique  de  raflSnement  de  mail¬ 
lages  que  nous  avons  utilise .  Enfin  dans  la  derniere 
partie,  on  presente  de  nombreux  resultats  numeriques  et 
des  comparaisons  avec  le  schema  de  Roe. 

1  Generalites  sur  les  schemas 
cinetiques 

Le  premier  schema  cinetique  pour  les  equations  d’Euler 
a  ete  introduit  par  D.  Pullin  dans  [19].  II  a  ensuite 
ete  revisite  et  ameliore  par  S.  Deshpande  dans  [4,5]. 
D’autres  schemas  cinetiques,  bases  sur  des  distribu¬ 
tions  d’equUibre  differentes  de  la  Maxwellienne,  ont  en¬ 
suite  ete  propose  par  divers  auteurs  dont  Kaniel  et 
Perthame  [7,8,9].  Le  grand  interet  des  travaux  de 
B.  Perthame  est  d’avoir  les  premiers  mis  en  evidence 
les  proprietes  theoriques  de  certains  schemas  cinetiques 
(ceux  associes  a  des  distributions  d’equilibre  a  sup¬ 
port  compact) xonsistance  avec  I’inequation  d’entropie, 
preservation  des  positivity,  ....  Signalons  enfin  les 
travaux  de  Mazet  et  al  concernant  les  liens  entre  les 
schemas  cinetiques  et  la  symetrisation  des  equations 
d’Euler  via  les  variables  entropiques  [1,12,11].  Nous  re- 
viendrons  sur  cet  aspect  dans  la  troisieme  partie  de  cet 
article. 
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1.1  Definition  d’un  schema  cinetique 

Les  schemas  cinetiques  sont  des  schemas  Volumes-Finis 
decentrra  pour  lesquels  la  fonction  Flux  Numerique 
est  de  type  Flux-Splitting  c’est-ardire  de  la 
forme  suivante: 


Differents  choix  sont  possibles  pour  la  fonction 
d’equilibre  fwiO-  hes  deux  plus  courants  sont  la  fonc¬ 
tion  ’creneau’  propoeee  par  Perthame  [7,8] 


uo  = 


_ P 

Vol(B^){2rTYI^  ^  2rT 


) 


T{w,  w',n)  =  :F'^{w,n)  + 


oil  w  et  w'  sont  deux  etats  quelconques  et  n  un  vecteur 
unitaire  de  (representant  la  normale  a  I’interface  en- 
tre  deux  cellules).  Les  fonctions  T'^{w,n)  et  T~{w,n) 
s’expriment  sous  la  forme  d’une  integrale  sur  I’espace  dit 
des  ’phases’  en  Physique  Statistique.  Dans  le  cas  d’un 
gaz  mono-atomique,  on  a  par  exemple: 


(1) 


oil  fwiO  designe  la  distribution  d’equilibre  des  particules 
et  satisfait  par  definition  les  relations: 


La  formule  (1)  pent  s’interpreter  en  considerant  que 
les  particules  traversant  une  interface  sont  constituees 
de  celles  venant  de  gauche  et  se  depla^ant  dans  le  sens 
de  la  normale  (contribution  a  F"*")  et  de  celles  venant 
de  droite  et  se  depla^ant  en  sens  oppose  a  la  normale 
(contribution  a  F~). 


figure  1:  Flux  de  particules  a  travers  une  interface 


(oil  Y  est  la  fonction  indicatrice  de  [0,1],  la  boule 
unite  de  R**)  et  la  fonction  ’Maxwellienne’  proposee  par 
Pullin  [19] 

~  (2irrT)‘'/2  2rT  ^ 

C’est  cette  derniere  que  nous  avons  choisie  car  c’est  la 
mieux  adsqitee  pour  I’extension  aux  melanges  de  gaz 
reels  [12]  et  c’est  celle  qui  conduit  aux  formules  les  plus 
simples  pour  I’extension  a  I’ordre  2  [14]. 

On  peut  bien  sur  exphciter  la  formule  (1)  dans  le  cas 
oil  /u,  est  une  Maxwellienne.  Dans  le  cas  d’un  gaz  parfait 
d’equation  p  =  prT  avec  e  =  f{T)  (f  fonction  reguliere 
quelconque)  on  a: 


=  ±h(X) 


pU  \c 

lp\U\^  +  peiT)  +  ^p  ) 


pU.n 

pUU.n  +  pn 

iy\U\^  +  peiT)-\-p)U.n 


(2) 


oil  c  =  \J2tT  X  = - 

c 

ff(-X’)  exp(-«2)d«  h{X)  =  ^^exp(-X^) 

Pour  une  presentation  plus  complete  des  schemas 
cinetiques  et  en  particulier  pour  la  definition  precise 
d’une  distribution  d’equilibre,  on  pourra  se  reporter  peir 
exemple  a  Particle  de  B.  Perthame  [7]. 


1.2  Quelques  proprietes  des  schemas 
cinetiques 

L ’expression  (1)  des  fonctions  F'*’  et  T~  permet 
de  demontrer  de  nombreuses  proprietes  des  schemas 
cinetiques.  En  particulier,  en  dimension  1  d’espace, 
lorsque  le  support  de  la  distribution  d’equilibre  est  un 
compact  de  la  forme  [— ^ma®,^mox],  on  peut  montrer  [7] 
que  les  schemas  cinetiques  associes  sont  entropiques  et 
preservent  la  positivite  de  la  densite  et  de  la  temperature 
sous  la  condition  CFL  At  <  Az/^mox- 

Dans  le  cas  d’lme  distribution  Maxwellienne  (support 
non  borne),  le  probleme  est  plus  delicat  et,  a  notre  con- 
naissance,  la  consistance  avec  I’inequation  d’entropie  a 
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seulement  ete  prouvee  de  maniere  formelle  dans  [5,11]. 
La  question  de  la  positivite  du  schema  a  par  centre  ete 
resolue  dans  [11,14].  On  rappelle  ci-dessous  le  principal 
rraultat. 

Pour  etre  le  plus  general  possible,  on  se  place  dans  un 
cadre  multidimensionnel  (d  designant  le  nombre  de  di¬ 
mensions  d’espace)  et  le  maillage,  note  Mh,  est  suppose 
quelconque.  On  note  K  un  element  quelconque  Ae  Mh, 
m(K)  sa  mesure  de  Lebesgue  dans  R**,  son  voisin  le 
long  de  la  face  e,  nx,e  normals  a  la  face  e  dirigee  de 
K  vers  Kt  et  m(e)  la  mesure  de  Lebesgue  de  la  face  e 
d2uis  (cf  figure  2). 


figure  2:  Vue  partielle  du  maillage 
On  montre  dans  [14]  la  proposition  suivante: 


2  Extension  a  Pordre  2  et  posi¬ 
tivite 

2.1  Principe  de  I’extension  a  I’ordre  2 

Pour  etendre  une  methods  de  volumes  finis  a  I’ordre  2 
en  espace,  il  exists  au  moins  deux  approches  classiques: 

•  La  premiere  (sans  doute  la  plus  utilises  du  fait  de 
sa  simplicite  et  de  sa  generalite)  est  la  methods 
MUSCL  de  Van  Leer.  Elle  consists  a  decomposer 
un  pas  de  temps  en  deux  etapes;  une  premiere  etape 
d ’interpolation  affine  de  la  solution  approchee,  une 
seconds  etape  ou  Ton  applique  le  schema  volume  fini 
aux  valeurs  interpolees  de  la  solution  approch^.  Le 
point  essentiel  reside  dans  le  fait  que  lors  de  I’etape 
d ’interpolation,  il  est  necessaire  de  limiter  la  valeur 
du  gradient  de  la  solution  approchee  afin  d’eviter 
I’apparition  d’oecillations. 

•  La  seconds  designee  dans  la  litterature  anglo-saxone 
sous  le  non  de  ’corrected  antidiifusive  flux  approach’ 
(elle  sera  notee  CAFA  par  la  suite)  consiste  a  ajouter 
au  flux  numerique  du  premier  ordre  une  correction 
antidiifusive  qui  doit  etre  limitee  pour  des  raisons 
de  stabilite  numerique. 

Ces  deux  methodes  ont  ete  tres  bien  etudiees  d’un 
point  theorique  dans  le  cas  dans  le  cas  d’une  loi  de  con¬ 
servation  scalaire  (voir  par  exemple  Goldveski-Raviart 
[6],  Coquel-Lefloch  [2],  ...).  En  particulier,  on  sait 
dans  ce  cas  donner  des  criteres  precis  sur  la  fa?on  dont 
les  pentes  doivent  etre  limitees  pour  que  la  methode 
numerique  soit  stable  au  sens  de  la  norme  BV  (schema 
TVB)  ou  de  la  norme  . 


Proposition  1  Le  schSma  Volximes-Finis 


='^K- 

'  '  eiSK 

associe  au  flux  numerique  deflni  par  les  formulas  (2) 
(distribution  Maxwellienne)  preserve  la  positivite  de  p 
ei  de  T  sous  la  condition  CFL 


sup 

KiMt. 


- 


(3) 


Remsurque:  En  pratique  cette  condition  est  un  peu 
plus  restrictive  que  la  condition  CFL  usuelle  (elle  cor¬ 
respond  environ  pour  un  gaz  parfait  avec  7  =  1.4  a 
CFL  =  0.5).  Toutefois  e’est  seulement  une  condition 
suffisante  et,  dans  les  applications,  on  n’a  jamais  con¬ 
state  de  difficulte  en  prenant  CFL  =  0.9. 


Dans  le  cas  des  systemes  generaux  de  lois  de  con¬ 
servation,  en  particulier  celui  des  equations  d’Euler,  il 
n’existe  actuellement  aucune  theorie.  On  se  contente 
done  en  general  de  raisonner  par  analogie  avec  le  cas 
scalaire,  afin  d’en  deduire  certains  criteres  empiriques  de 
stabilite.  De  plus,  dans  le  cas  de  la  dynamique  des  gaz, 
vient  se  rajouter  le  fait  que  la  solution  w  =  {p,  pU,  pE)  ne 
prend  pas  ses  valeurs  dans  tout  entier  mais  seule¬ 
ment  dans  un  sous  ensemble  de  celui-ci,  Wad,  defini  par 
des  contraintes  de  positivite  p  >  0,  pE  —  1  /2pU^  >  0. 
Quelque  soit  I’approche  adoptee,  MUSCL  ou  CAFA, 
il  est  necessaire  de  joindre  (ou  eventuellement  de  sub- 
stituer)  a  ces  criteres  empiriques  de  limitation  de  pente 
ou  de  flux,  qui  permettent  de  controler  les  oscillations, 
une  condition  qui  garantisse  que  le  schema  laisse  invari¬ 
ant  I’ensemble  Wad  (ce  qui  suppose  bien  sur  que  la  pro- 
priete  est  deja  satisfaite  par  le  schema  d’ordre  1).  Il 
est  interessaint  de  remarquer  que  cette  seule  propriete 
d’invariance  de  I’ensemble  Wad  garantit  la  stabilite  en 
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norme  du  schema  [9]  et  constitue  done  un  critere  de 
stabilite  faible. 

Dans  le  cas  de  la  methode  MUSCL,  une  variante 
a  ete  proposee  par  Perthame  et  Qiu  qui  garantit  la 
preservation  des  positivites.  Le  principe  est  de  constru- 
ire  la  solution  interpolee  a  chaque  pas  de  temps  de  sorte 
que  Ton  ait  a  la  fois  conservation  de  p,  pU,  pE  sur  chaque 
cellule  et  positivite  de  p  et  T  aux  noeuds  du  maillage  [10]. 
Cependant  les  resultats  numeriques  sont  assez  decevant 
du  point  de  vue  du  gain  en  precision,  et  leur  technique 
de  reconstruction  semble  difficilement  generalisable  sur 
des  maillages  quelconques. 

L’approche  que  nous  proposons  dans  cet  airticle  est 
plutdt  une  variante  de  la  methode  ’CAFA*.  Elle  peut  a- 
priori  s’etendre  a  tout  schema  de  Flux-Splitting  (cet  as¬ 
pect  est  devellope  dans  [14])  mais  les  schemas  cinetiques 
possedent  toutefois  deux  avantages  essentiels: 

•  La  positivite  peut  etre  prouvM  dans  le  cas  du 
schema  d’ordre  1  ce  qui  n’est  pas  le  cas  par  exem- 
ple  pour  les  schemas  de  Flux-Splitting  de  Steger  et 
Warming  ou  de  Van  Leer. 

•  II  est  possible,  grace  a  la  repr&entation  integrale 
(1)  du  flux  numerique,  d’expliciter  les  limitations  a 
imposer  sur  les  corrections  antidiffusives  pour  que 
le  schema  preserve  les  positivites. 

Nous  allons  rappeler  ici  les  grandes  lignes  de  cette 
approche,  en  renvoyant  a  [14]  pour  les  detaUs.  L’idee 
generale  est  de  remplacer  sur  chacune  des  interfaces  e 
du  maillage  ,Tijc.e)  P"  et 

i^K. .  njc.e)  par  {wk.  ,  njf.e)  -I-  ,  ou  les  func¬ 

tions  et  sont  definies  par  les  formules  (2)  (cor- 
respondant  a  une  distribution  Maxwellienne)  et 
et  sont  des  corrections  antidiffusives  qui  pour  un 

gaz  mono-atomique  s’expriment  sous  la  forme  integrale 
suivante: 

I  I  I  (4) 

■  \  l/2|^p  ) 

=  /  (^nK.e)-  I  \  I 

•'R'  V  V2iep ) 

(5) 

oil  A/u,(0  est  une  correction  de  la  distribution 
d’equilibre,  function  comme  dans  le  developpement  de 
Chapmann  et  Enskog  [18]  des  gradients  locaux  de  la  so¬ 
lution.  Dans  le  cas  ou  /u,  est  une  Maxwellienne,  elle 
s’exprime  sous  la  forme  suivante  (cf  [14]): 


i=i 


j=i 


y(x)  =  l  St  |ar|<^ma*,  0  si  |x|>^„ 


-  *ir)  + 

mSr,.  =  -  •,k)  +  i(9T)S:f- 

oil  Xt  designs  le  centre  de  la  face  c,  xk  le  centre  de 
gravite  de  la  cellule  K,  (ps)]^,  {pU^)^,  et  {pT)^  des 
estimations  des  gradients  spatiaux  respectifs  de  s,U^  et 
T  dans  la  cellule  K  a  I’instant  n  et  (^s)^,  et 

(qT)^  des  estimations  des  derives  temporelles  de  s,U^ 
et  T  dans  la  cellule  K  a  I’instant  n.  Nous  preciserons  au 
paragraphs  suivant  les  choix  effectues  pour  calculer  ces 
quantites  ainsi  que  la  valeur  de 

NB:  Noter  que  Ss,  6U  et  6T  sont  des  quantites  en  0{h) 
(  h  etant  le  pas  du  maillage)  lorsque  la  solution  est 
reguliere. 

Du  point  de  vue  cinetique,  on  peut  interpreter  les 
formules  (4)  et  (5)  en  considerant  que  la  distribution 
d’equilibre  Mcixwellienne  remplacee  par  une 

distribution  d’equilibre  ’perturbee’  /»«  (^)  +  Afw'iOt 
tenant  compte  des  gradients  de  la  solution  (ce  point  de 
vue  est  expose  dans  [15]).  Cette  idee  etait  deja  presents 
dans  les  travaux  de  S.  Deshpande  [5].  L’introduction 
d’une  function  creneau  y(x)  dans  la  definition  de  A/„, 
permet  de  garantir  que  la  distribution  modifi^  /t«*  (0+ 
A/u)»  (0  reste  toujours  positive  (a  condition  de  choisir 
convenablement  ^mor)-  Cette  propriete  joue  un  role  es- 
sentiel  pour  la  positivite  du  schema  d’ordre  2. 

On  montre  dans  [14]  le  resultat  suivant: 

Proposition  2  Si  pour  tout  K  ^  Mh  ct  pour  tout  t  € 
K: 
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0) 

(ii)  Im„x  = 


-m)7cj  +  yJimic.e"+M{ST)-K,Jl  -  |(g«)^,.|) 

2n/5|(6T)S:,J 


avec  |(^C^)Sr,el 


d 

^  gP,  le  schema  decrit  ci- 

]  i=i 


dessus  esi  du  second  ordre  en  temps  ei  en  espace  si  Mh 
esi  une  grille  cartisienne  et  si  les  gradients  sent  esiimees 
an  second  ordre.  De  plus,  il  preserve  la  positiviie  de  p  et 
de  T  sous  la  condition  CFL: 


sup 


- sw - 


Remarques: 

•  On  remarque  qu’aucune  limitation  de  type  ’min- 
mod’  n’est  necessaire  pour  gaiantir  la  positivite  du 
schema.  On  en  donne  une  illustration  numerique 
dans  [14].  Toutefois  les  limitations  (i)  et  (ii) 
ne  sont  pas  suffisantes,  pour  controler  totalement 
I’apparition  d ’oscillations  spatiales.  II  est  necessaire 
en  pratique  de  les  associer  a  d’autres  limitations  plus 
classique  de  type  ’min-mod’  qui  seront  explicitees  au 
paragraphe  suivant. 

•  La  condition  CFL  ci-dessus  est  un  peu  trop  restric¬ 
tive.  Dans  les  applications,  nous  n’avons  jamws  con¬ 
state  de  difficulte  en  prenant  CFL  =  0.9. 

•  Tons  les  resultats  exposes  ci-dessus  se  generalisent 
dtms  le  cas  d’un  gaz  parfait  polyatomique  de  y 
quelconque.  II  suffit  d’augmenter  la  dimension  de 
I’espace  des  phases  pour  prendre  en  compte  les 
degres  de  liberte  internes  des  molecules.  La  condi¬ 
tion  de  positivite  fait  cette  fois  intervenir  une  con- 
trainte  supplementaire  sur  |5J'|.  On  renvoie  a  [14,13] 
pour  les  details  et  les  formules  explicites  permettant 
de  calculer  et  A!F~ . 


2.2  Principe  du  calcul  et  de  la  limitation 
des  gradients  de  la  solution  discrete 

De  nombreuses  solutions  sont  proposees  dans  la 
litterature  pour  estimer  les  gradients  de  la  solution 
discrete  a  partir  de  ses  valeurs  dams  chacune  des  cel¬ 
lules  du  maillage.  Par  souci  de  simplicite  et  egalement 
pour  des  raisons  liees  a  notre  structure  de  donnees,  nous 
avons  choisi  la  formula  suivante: 


Vg/f  = 


1 

2m{K) 


^  {qx  +  qK.)nK,tm{e) 

eeaK 


(6) 


oil  q  designe  une  composante  quelconque  de  w.  Cette 
formula  est  d ’ordre  2  en  />  si  la  solution  est  reguliere  et 
si  le  maillage  est  cartesian. 


Pour  limiter  les  gradients  obtenus  nous  avons  utilise 
une  generalisation  multidimensionnelle  du  limiteur  ’min- 
mod’,  qui  consiste  a  imposer  que  les  vadeurs  Min  et  Maoc 
de  la  fonction  q{x)  =  -1-  Vgjf.(i  —  Xfc)  au  centre 

des  faces  de  I’element  K  (et  non  pas  aux  sommets  de 
I’element  K,  ce  qui  seradt  plus  contraignant)  soient  com¬ 
prises  entre  le  Min  et  le  Max  des  valeurs  de  g  sur  les 
elements  voisins  de  K.  En  pratique  on  commence  par 
calculer  les  vadeurs  Min  et  Max  de  q{x)  sur  I’element  K 
(notees  qmin  et  qmax)  ainsi  que  les  valeurs  Min  et  Max 
de  g  sur  les  elements  voisins  (notees  gm«n  et  qmax)-  On 
pose  ensuite: 


Omar  =  Max  0, 


QmoT  QK 
Qmax  QK  , 


■hx  I  n  I 

Q^mtn  —  Max  (0,  1 

Qmin  QK  J 

Ot  =  Min(l,  Qifnin  *  Q^moz) 

On  prend  enfin: 


Vg^'"  =  aVgK 

L’estimation  des  derivees  en  temps  de  la  solution 
discrete  se  fait  a  partir  de  la  forme  non  conservative 
des  equations  d’Euler.  Sous  forme  non  conservative,  le 
systeme  de  la  dynamique  des  gaiz  pent  en  effet  s’ecrire; 


dts  =  -Cf.Vs 

dtU  =  -U®VU-  -Vp 
P 

dtT  =  -U.VT  -  (7  -  l)7’divC7 

On  obtient  les  estimations  souhaiitees  de  dts,  dtU  et  dfT 
en  se  servant  de  ces  relations  et  des  valeurs  de  Vs,  Vp, 
VT  et  Vt/,  calculees  selon  la  formule  (6). 

3  Critere  de  raffinement  et  mail- 
lages  auto-adaptatifs 

3.1  Description  de  la  technique  de  raf¬ 
finement  de  maillages 

Afin  d’ameliorer  la  precision  des  resultats  pour  le  cal¬ 
cul  d’ecoulements  stationnaires,  une  procedure  de  raf¬ 
finement  automatique  de  maillages  a  ete  implantee  dans 
les  codes  de  calcul  2D  et  3D.  Le  principe  utilise  est  le 
suivant: 
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1.  On  commence  par  calculer  une  premiere  solution 
stationnaire  sur  le  maillage  groesier  de  depart. 

2.  On  calcule  alors  sur  chacun  dee  elements  du  maillage 
la  vaJeur  du  critere  de  raffinement  et  en  fonction  de 
ce  critere  certains  elements  sont  raffines  selon  une 
procedure  qui  sera  decrite  ci-dessous. 

3.  On  calcule  ensuite  une  nouvelle  solution  sur  le  mail' 
lage  raffine  en  partant  de  la  solution  sur  le  maillage 
precedent. 

4.  On  reitere  eventuellement  le  processus 

On  utilise,  dans  notre  code  de  calcul  2  types 
d ’elements  en  dimension  2  (triangles  et  quadrangles)  et 
3  en  dimension  3  (tetraedres,  pentaedres  et  hexaedres). 
La  repartition  de  ces  differents  elements  pent  etre  quel- 
conque  au  sein  d’un  meme  maillage.  En  particulier  il 
n’est  pas  necessaire  d’assurer  la  coincidence  nodale  en- 
tre  deux  elements  voisins  comme  I’illustre  par  exemple 
la  figure  3. 


figure  3:  Vue  partielle  d’un  maillage  non  conforme 

Pour  ralfiner  un  element,  le  principe  consiste  dans  tons 
les  cas  a  le  diviser  en  un  certain  nombre  d ’elements 
fils  (4  en  dimension  2,  8  en  dimension  3)  tons  sem- 
blables  a  I’element  de  depart.  Chaque  element  est  bien 
sur  rafline  independamment  de  ses  voisins,  si  bien  qu’a 
I’issue  d’une  phase  de  raffinement  le  maillage  obtenu  est 
generalement  non  conforme  (non  coincidence  nodale  en- 
tre  certains  elements) .  On  a  schematise  sur  les  figures  4 
et  5  le  principe  de  raffinement  d’un  triangle  et  d’un  quad¬ 
rangle.  En  dimension  3,  on  renvoie  pour  une  description 
detaillee  de  la  procedure  de  raffinement  au  travail  de  J. 
Delaire  [20]. 


figure  4:  Raffinement  d’un  triangle 


figure  5:  Raffinement  d’un  quadrangle 

3.2  Un  critere  de  raflBnement  fonde  sur 
la  production  locale  d’entropie 

Nous  allons  maintenant  decrire  le  critere  de  raffinement 
utilise.  Ce  critere  a  ete  introduit  par  P.  Mazet  et  :il  dans 
[1,11].  II  repose  sur  les  liens  entre  les  schemas  cinetiques 
et  la  symetrisation,  via  les  variables  entropiques,  des 
equations  d’Euler. 

Commengons  par  quelques  rappels  sur  la 
symetrisation  des  equations  d’Euler.  Afin  de  simplifier 
les  notations,  on  se  restreint  au  cas  d’un  gaz  parfait  poly- 

tropique.  Soit  s  =  rlog( - )  I’entropie  massique 

T 

du  gaz.  II  est  bien  connu  que  la  fonction  S(i/;)  =  —ps 
(ou  w  est  le  vecteur  des  variables  conservatives)  est  une 
fonction  strictement  convexe  en  w  et  constitue  une  en- 
tropie  de  Lax  pour  les  equations  d’Euler,  associ^  au 
flux  d’entropie  US{w).  Les  variables  entropiques  if>a  sont 
alors  definies  de  la  maniere  suivante: 
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as  _ 

dp  *  2T 

^  _  55  _  1 

~  dpE~  T 


(7) 


Lee  relations  (7)  definissent  un  changement  de  vari¬ 
ables  bijectif  tn  — ^  de  Wad  sur  4&o(f  =  R**"*"*  x  R"*^.  On 
definit  alors  sur  $„<«  la  transformee  de  Legendre  5*  de  5 
par: 


R,  on  obtient  une  decomposition  convexe-concave  de  la 
fonction  2i*  en  posant  simplement: 

n)  =  +  <I>pB-y'^M^ 

(10) 

En  difTerenciant  (10)  par  ri^port  a  ^  on  voit  facile- 
ment  que  les  fonctions  F'*'(tn,n)  et  jP“(tr,n)  ainsi 
obtenues  ne  sont  autres  que  les  fonctions  ^■'■(«),n)  et 
^~(w,n)  definies  par  (1).  On  peut  done  ecrire  que: 


5*(^)  =  -5(«;(^))-|-u;(^).^ 

De  meme,  on  introduit  une  pseudo-transformee  de 
Legendre  du  flux  d’entropie  dans  la  direction  du  vecteur 
n,  notee  X)*(^,n),  en  posant: 

X;*(^,n)  =  -U.nS{w{<f>))  +  {F{w{<l>)).n).<f> 

oil  F(u;).n  =  [pU.n,  pUU.n  +pn,{pE +p)U.nY  designe  le 
flux  dans  la  direction  n.  Cette  fonction  est  appelee  fonc¬ 
tion  de  symetrisation  du  systeme  des  equations  d’Euler 
car  eUe  possede  par  construction  la  propriete  suivante: 

F(u;(^)).n  =  V^E*(^,n)  (8) 

Le  flux  des  equations  d’Euler,  exprime  en  variables 
entropiques,  est  done  le  gradient  de  la  fonction  de 
symetrisation  E*(^,  n).  II  en  decoule  immediatement, 
qu’ecrit  en  variables  entropiques,  le  systeme  de  la  dy- 
namique  des  gaz  est  symetrique.  D ’autre  part,  d’aprra 
(8)  toute  decomposition  de  la  fonction  E*(^,n)  en  la 
somme  de  deux  fonctions  et  E*“(^,n)  induit, 

en  differenciant  par  rapport  a  <f),  une  decomposition  du 
flux  F(w).n  en  la  somme  de  deux  fonctions  F'^{w,  n)  et 
F~(w,n).  Si  de  plus,  pour  tout  n,  E*+(^,n)  est  con- 
vexe  en  <f>  et  E*”(^,n)  concave,  alors  on  peut  montrer 
(voir  [11])  que  la  jacobienne  de  F'^(w,n)  est  diagonalis- 
able  a  valeurs  propres  positives  et  que  la  jacobienne  de 
F~{w,n)  est  diagonalisable  a  valeurs  propres  negatives. 
La  decomposition  en  partie  convexe  et  concave  d’une 
fonction  de  symetrisation  fournit  done  un  moyen  naturel 
de  construire  des  schemas  de  Flux-Splitting  correctement 
decentre. 

Le  lien  avec  le  formalisme  cinetique  provient  du  fait 
que  la  fonction  E*  peut  s ’ecrire,  a  une  constante  multi¬ 
plicative  pres,  sous  la  forme  integrale  suivante  (cas  d’un 
gaz  mono-atomique): 

E*(<^,n)=  /  {^.n)exp[{<f>f,  +  <f>f,u-^  +  <iipE-^)/r]d(, 

La  fonction  exp[(^p  -I-  +  <t>pE-^)/p]  n’est  autre 

que  la  Maxwellienne.  La  fonction  exp  etant  convexe  sur 


F±(u;,n)  =  V^E**(^(u;),n)  (11) 

Nous  allons  maintenant  utilise  la  propriete  (11)  et  la 
concavite  de  la  fonction  E*“(^(ti>),n)  pour  etablir  une 
estimation  de  la  production  locale  d’entropie  du  schema 
cinetique  d’ordre  1,  introduit  a  la  section  1,  lorsque  I’etat 
stationnaire  est  atteint.  Par  definition,  lorsque  I’etat  sta- 
tionnaire  est  atteint  ,  on  a  sur  toutes  les  cellules  K  du 
maillage: 

53  {wK,,nK,t)\  m{e)  =  0 

e€9K 

En  se  servant  du  fait  que  F(u!jf  )-»»K,e»”(e)  =  0  (for- 

eeOK 

mule  de  Green),  et  en  ajoutant  cette  quantite  au  premier 
membre  de  la  precedente  egalite  on  obtient: 

y3  [•F“(tuji:.,nK,e) -■^“(u'ifin/f.e)]  »^(e)  =  0  (12) 
eedK 

D’autre  part,  posons: 

E±(«»,n)  =  -E*±(^(u;),n)-|-F±(u;).^(ui)  (13) 

On  deduit  de  la  definition  de  E*  que: 

E''’(iy,n)  +  E~(ui,n)  =  U.nS{w) 

Le  couple  [E‘''(ti),n),E~(u),n)]  constitue  done  une 
decomposition  du  flux  d’entropie.  En  multipliant 
scalairement  I’egalite  (12)  par  4>k  —  4>{^k) 
vant  de  (13),  (11)  et  de  la  formula  de  Green,  on  obtient: 

53  [S'*'(ui/f,nif,e)-f  E”(u>/f.,njf,e)]  m(e)  -Qx  =  0 

(14) 

avec: 

Qk=  [^~(^/f.«fc,e)  -  S*~(^/f..ni,e) 

e€6K 

-V^E*  (^K.  .nfc.e)  i^K  -  <I>K.)] 
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La  fonction  E*“(.,n)  etant  concave,  le  tenne  Qk  est 
negatif.  II  pent  s’interpreter  comme  la  production  lo¬ 
cale  d’entropie  sur  la  cellule  K,  I’autre  terme  dans  (14) 
etant  simplement  un  terme  de  flux.  C’est  cette  quantite 
que  nous  avons  utilises  comme  critere  de  rafflnement, 
le  calcul  explicits  de  la  fonction  E*“  pouvant  s’efiectue 
facilement  grace  a  la  deflnition  (10). 


4  Schema  implicite 

Pour  le  calcul  d’ecoulement  stationnaire  3D,  I’utilisation 
d’un  schema  explicits  en  temps  s’est  aveiee  trop 
couteuse,  meme  avec  une  technique  de  pas  de  temps  lo¬ 
cal.  Le  schema  decrit  a  la  section  2  a  done  etc  implicite, 
suivant  le  principe  classique  qui  consiste  a  linearise  par- 
tiellement  le  systeme  non  lineaire  que  Ton  doit  resoudre 
a  chaque  pas  de  temps.  De  ce  point  de  vue,  les  schemas 
cinetiques  possedent  une  particularite  interessantedes 
fonctions  et  sont  differentiables  et  homogenes 
de  degre  1.  Elle  verifient  done  les  relations  suivantes: 

F+(tt;,  n)  =  [Jac(F+)(u;,  n)].ti;  ,  . 

F“(u;,n)  =  [jac(F“)(«;,  n)].u;  ' 

ce  qui  permet  de  simplifler  I’ecriture  de  la  forme 
linearisee  du  schema  implicite.  Celle-ci  peut  done 
s’ecrire: 


5I[*^«c(F  ){w]c.,nK.e)lw^t^Tn(e) 


ceoK 


M«+i 


m(K) 


eedK 

At 


t^dK 


La  correction  du  second  ordre  n’est  pas  implicitee,  afin 
de  simplifier  I’expression  de  la  matrice  Jacobienne  du 
flux  numerique.  On  n’a  pas  rencontre  pour  autant  de 
probleme  de  stabilite. 

A  chaque  pas  de  temps,  le  systeme  lineaire  ci-dessus 
est  resolu  par  une  methods  iterative.  On  en  a  compare 
deux:  la  methods  de  Jacobi  et  la  methods  BICGstab 
[17].  Si  on  se  contents  d’une  precision  moyenne  a  chaque 
resolution  (ce  qui  est  suffisant  en  pratique),  la  methode 
de  Jacobi  est  un  peu  plus  performante  en  temps  CPU. 
La  tendance  s ’inverse  si  on  souhaite  une  tres  grande 
precision.  L’utilisation  d’un  preconditionneur  pour  la 
methode  BICGstab  ameliore  nettement  la  vitesse  de 
convergence  (d’un  facteur  2  au  moins)  mais  n’apporte 
pas  un  gain  en  temps  CPU  compte  tenu  du  cout  du 
preconditionnement. 


5  Resultats  numeriques 

Afin  d’evaluer  la  precision  et  la  robustesse  de  ce  nou¬ 
veau  schema,  nous  avons  realise  plusieurs  experiences 
numeriques. 

Tout  d’abord,  afin  d’illustrer  numeriquement  la 
preservation  des  positivites,  nous  avons  calcule  la  so¬ 
lution  du  probleme  de  Riemann  propose  par  Sjogreen 
dans  [3],  pour  lequel  la  solution  est  trra  proche  du  vide. 
Nous  avons  effectue  les  calculs  sans  utiliser  de  Umiteur 
de  pentes  de  type  min-mod  (cf  figure  6)  puis  avec  lim- 
iteur  (cf  figure  7).  Les  resultats  obtenus  sems  limita¬ 
tion  sont  bons  mais  font  apparaitre  quelques  oscillations, 
qui  disparaissent  avec  I’utilisation  du  limiteur.  Pour 
des  cas  test  plus  complexe  (presence  de  discontinuites) 
I’utilisation  d’un  limiteur  est  necessaire. 

Sur  les  figures  8  et  9,  on  prraente  les  resultats  con- 
cernant  le  cas  d’un  ecoulement  hypersonique  2D  a  Mach 
25  et  Incidence  30.  Le  maillage  a  ete  obtenu  apr^  trois 
raffinements  successife.  La  methode  de  Roe  n’a  pu  etre 
utilisee  qu’a  I’ordre  1  car  a  I’ordre  2,  meme  avec  de  fortes 
limitations  de  pentes,  des  temperatures  negatives  appa- 
raissent  a  I’arriere  corps.  Les  resultats  obtenus  avec 
le  schema  cinetique  d ’ordre  2  sont  tres  bons  et  bien 
meilleurs  que  ceux  obtenus  avec  le  schema  de  Roe.  En 
particulier,  on  peut  constater  I’absence  d’oscillations  sur 
les  courbes  du  coefficient  de  pression  a  la  paroi  (figure 
8),  contrairement  aux  resultats  obtenus  avec  le  schema 
de  Roe  premier  ordre  (avec  correction  d’entropie). 

Sur  les  figures  10  a  12,  on  presente  les  resultats  con- 
cernant  le  calcul  d’un  ecoulement  instationnaire  entrant 
a  Mach  3  dans  un  tunel  comportant  une  marche.  Ce  cas 
test  est  tres  classique  et  bien  documente.  On  compare  les 
resultats  obtenus  avec  la  methode  de  Roe-MUSCL  (avec 
correction  entropique)  et  le  schema  cinetique.  (sans  cor¬ 
rection  entropique).  Les  deux  maillages  utilises  sont 
composes  de  triangles  (le  pas  choisi  est  de  1/40*'"'  ce  qui 
est  assez  grossier  pour  ce  cas).  Le  second  maillage  a  ete 
reaffine  pres  de  la  paroi  afin  d’eliminer  I’influence  de  la 
couche  limite  numerique.  On  peut  noter  que,  sur  les  deux 
maillage,  la  detente  sonique  est  mieux  capture  avec  le 
schema  cinetique  qu’avec  le  schema  de  Roe.  On  con¬ 
state  de  plus  la  presences  de  nombreuses  oscillations  sur 
les  courbes  iso-densite  obtenus  avec  le  schema  de  Roe. 
(Celles-ci  disparaissent  si  on  renforce  les  limitations  de 
pentes).  Par  contre,  avec  le  schema  cinetique,  la  position 
de  la  ligne  de  choc  apres  la  deuxieme  reflexion  n’est  pas 
correcte.  Ce  defaut  semble  du  a  la  trop  grande  epaisseur 
de  la  couche  limite  numerique.  II  disparait  lorsqu’on  raf- 
fine  le  maillage  pres  de  la  pauoi.  Avec  le  schema  de  Roe, 
on  constate  que  les  oscillations  numeriques  presentes  au 
niveau  de  la  ligne  de  glissement  sont  amplifiees  lorsqu’on 
raffine  le  maillage  au  niveau  de  la  paroi. 
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Sur  les  figures  13  et  14,  on  presente  le  csJcuI  d’un 
ecoulement  stationnaire  entrant  a  Mach  2  dans  un  tunel 
comportant  une  rampe  inclinee  a  15  degres.  Le  mail- 
lage  a  dte  obtenu  apr»  trois  raffinements  successifs  a 
partir  d’un  mailiage  groesier  comportant  2500  cellules. 
Le  critere  de  raffinement  utilise  (cf  section  4)  a  permis 
de  detecter  toutes  les  ondes  presentes  dans  I’ecoulement; 
en  particulier  le  mailiage  a  ete  raffine  au  niveau  de  la 
ligne  de  glissement  emanant  du  point  triple  situe  sur  la 
paroi  superieure  (cf  figure  11).  On  voit  sur  la  figure  12 
que  le  raifinement  du  mailiage  a  permis  une  amelioration 
sensible  de  la  qualite  des  resultats. 

Enfin  on  presente  sur  les  figures  15  a  17  des  resultats 
numeriques  3D  concernant  le  calcul  d’un  ecoulement 
trsmsonique  (Mach:  0.84,  Incidence:  3.06  degres)  au- 
tour  de  la  voUure  M6  de  I’ONERA.  Ce  cas  test  est  trM 
bien  documente  dans  [21].  Le  mailiage  initial  est  con- 
stitue  d’environ  60000  tetraedres,  ce  qui  est  assez  grossier 
pour  ce  type  de  calculs.  Le  mailiage  final  (figure  20)  a 
ete  obtenu  aprra  2  raffinements  successifs.  Les  rraultats 
obtenus  sont  tout  a  fait  en  accord  avec  ceux  des  differents 
contributeurs  du  workshop  AGARD  [21].  Le  raffinement 
du  mailiage  permet  la  encore  d’ameliorer  sensiblement  la 
precision  des  resultats. 

6  Conclusion 

On  a  presente  dans  cet  article  un  nouveau  schema 
cinetique  d’ordre  2  preservant  la  positivite  de  la  masse 
volumique  et  de  la  temperature  sous  condition  CFL.  Les 
premier  resultats  numeriques  obtenus  sur  maillages  non 
structures  sont  tres  bons  et  confirment  les  proprietes 
theoriques  de  robustesse  du  schema.  De  plus  I ’estimation 
d’entropie  discrete  associee  au  schema  d’ordre  1  permet 
de  degager,  de  maniere  naturelle,  un  critere  de  raffine¬ 
ment  de  mailiage  fonde  sur  la  production  d’entropie  lo¬ 
cale.  Ce  critere  semble  un  excellent  candidat  pour  la 
capture  des  discontinuites  stationnaires. 

La  suite  de  cette  etude  va  consister  a  etendre  ce 
schema  au  calcul  d’ecoulements  reactifs,  pour  lesquels 
la  robustesse  de  la  methode  numerique  est  un  critere  es- 
sentiel.  Ce  travail  est  en  cours. 
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DENSITY  PRESSURE 


VELOCITY 


figure  6:  Sjogreen  test  case:  with  no  Min-Mod  limitations 


DENSITY  PRESSURE  VELOCITY 


figure  7:  Sjogreen  test  case:  with  Min-Mod  limitations 


figure  8:  Hermes  test  case:  Reffined  Mesh  (10000  cells)  and  Cp  at  the  wall 


figure  10:  Medium  Mesh  (9000  cells)  and  Reffined  Mesh  (14000  cells) 


Kinetic  Scheme 


Roe  scheme 


figure  11:  Emery  test  case:  Iso  density  Lines  on  the  medium  Mesh 


Kinetic  Scheme 


Roe  scheme 


figure  12:  Emery  test  case:  Iso  density  Lines  on  the  reffined  Mesh 


figure  13:  Coarse  Mesh  (2500  cells)  and  Reffined  Mesh  (9000  cells) 


Coarse  mesh 


Reffined  mesh 


figure  14:  Iso  density  Lines 
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1.  ABSTRACT 

This  paper  describes  the  results  of  the  research  car¬ 
ried  out  by  the  authors  in  the  computer  modelling 
of  flow  problems  using  an  approximation  based  on 
“clouds  of  points”  which  does  not  require  the  defini¬ 
tion  of  a  mesh.  The  so  called  Finite  Point  Method 
(FPM)  [5]  is  presented  showing  some  examples  for 
the  solution  of  the  ID  convection  diffusion  equation 
and  2D  compressible  inviscid  flows. 

2.  INTRODUCTION 

The  finite  element  method  (FEM)  and  the  finite  vol¬ 
ume  method  (FVM)  are  well  established  numerical 
techniques  whose  main  advantage  is  their  ability  to 
deal  with  complicated  domains  in  a  simple  manner 
while  maintaining  a  local  character  in  the  approxi¬ 
mation.  Both  methods  seek  to  divide  the  total  do¬ 
main  into  a  finite  number  of  subdomains  (or  ele¬ 
ments)  wherein  a  volume  integration  is  performed. 
For  these  reasons  the  subdomains  are  limited  by 
some  regularity  of  geometrical  conditions  such  as 
having  a  positive  volume  or  a  limited  aspect  ratio 
between  elements,  angles,  etc.  Although  this  poses 
no  serious  difficulties  for  2D  situations,  the  lack  of 
efficient  3D  mesh  generators  makes  the  solution  of 
3D  problems  a  difficult  tcisk. 

It  is  widely  acknowledged  that  efficient  3D  mesh  gen¬ 
eration  remains  one  of  the  big  challenges  in  FE  and 
FV  computations.  Thus,  even  the  more  complex 
problems  in  CFD,  such  as  some  3D  solutions  of  the 
Navier-Stokes  equations,  can  be  accurately  tackled 
nowadays  providing  an  acceptable  3D  mesh  is  avail¬ 
able.  However,  the  generation  of  3D  meshes,  despite 
major  recent  advances  is  still  a  bottle  neck  and  it  can 
absorb  far  more  time  and  effort  than  the  numerical 
solution  itself. 

Different  authors  have  recently  investigated  the  pos¬ 


sibility  of  deriving  numerical  methods  without  us¬ 
ing  meshes.  Nayroles  et  al  [l]  proposed  a  technique, 
calling  it  diffuse  element  method  (DEM),  where  only 
some  nodes  and  a  boundary  description  is  necessary 
to  formulate  the  Galerkin  equations.  The  intepo- 
lating  functions  are  polinomieils  fitted  to  the  nodel 
values  by  a  least  squares  approximation.  Although 
no  finite  element  mesh  is  explicitly  required  in  this 
method,  still  some  kind  of  “auxiliary  grid”  is  needed 
in  order  to  compute  numerically  the  integral  expres¬ 
sions  deriving  from  the  Gcderkin  approach.  This  re¬ 
quirement  may  prelude  the  successful  extension  of 
the  DEM  to  3D  problems. 

More  recently,  Belytschko  et  al  [2]  have  proposed  an 
extension  of  the  DEM  which  they  call  the  element- 
free  Galerkin  (EFG)  method.  In  that  work,  gen¬ 
eralized  moving  least  squares  interpolants  typically 
exploited  in  curve  and  surface  fitting  are  used  to  de¬ 
fine  the  local  approximation.  This  provides  addi¬ 
tional  terms  in  the  derivatives  of  the  unknowns  field 
omitted  by  Nayroles  et  al  [1].  In  addition,  a  reg¬ 
ular  cell  structure  is  chosen  as  the  “auxiliary  grid” 
to  compute  the  integrals  by  means  of  a  higher  order 
quadrature.  Finally,  Lagrange  multipliers  are  used 
to  enforce  the  essential  boundary  conditions.  The 
same  approach  has  been  further  generalized  by  Liu 
et  al  [3]  by  introducing  concepts  from  wavelet  theory. 

The  use  of  “clouds  of  points”  to  define  local  approxi¬ 
mations  is  by  no  means  new  and  it  has  enjoyed  some 
popularity  among  finite  difference  (FD)  practitioners 
to  derive  generalized  FD  schemes  in  arbitrary  irreg¬ 
ular  grids.  Here  typically  the  concept  of  a  “star”  of 
nodes  is  introduced  to  derive  FD  approximations  by 
means  of  a  local  Taylor  expansion  using  the  infor¬ 
mation  by  the  number  and  position  of  nodes  con¬ 
tained  in  each  star.  These  ideas  have  been  success¬ 
fully  applied  in  fluid  mechanics  under  the  name  of 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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Smooth  Particle  Hydrodynamics  Method.  A  recent 
extension  of  these  concepts  to  the  solution  of  high 
speed  flow  problems  heis  recently  been  attempted  by 
Batina  [4]. 

In  this  paper  a  general  methodology  for  the  numer¬ 
ical  solution  of  high  speed  flows  using  a  finite  set 
of  cirbitrary  points  is  described.  The  approach  pro¬ 
posed  incorporates  the  main  features  of  generalized 
finite  difference  schemes  and  other  more  recent  point 
data  based  procedures  such  as  the  DEM  and  the 
EFG  [5] .  The  theoretical  basis  of  the  method  in  the 
context  of  the  solution  of  viscous  and  inviscid  flows 
are  described  in  some  detail.  The  accuracy  and  ap¬ 
plicability  of  this  method  is  shown  in  some  examples 
of  application  in  ID  and  2D  flow  problems. 

3.  METHODOLOGY 

3.1  The  Finite  Point  method  (FPM) 

From  a  polynomieJ  expansion  of  order  m  a  function 
u(a:!)  can  be  approximated  in  a  local  interpMating 
domain  Qi  (sometimes  also  termed  “clouds”) 

u{x)  ~  ti(x)  =  ai  -f  a^x  -f-  a^x^  -|- ...  -j-  am®"*  ^  = 

=  p^(a:)a  (1) 


3.1.2Xeast  squares  interpolation 

Increasing  the  number  of  nodes  in  fl^  to  n  >  m,  we 
cannot  directly  invert  C  anymore.  However,  through 
least  squares  approximation  a  square  matrix  is  ob¬ 
tained  which  can  be  inverted  if  C  has  a  full  rank, 
which  is  assumed  in  what  follows.  Hence,  the  fol¬ 
lowing  sum  of  squares  can  be  written  using  eq.  (1) 


/  =  =  sk-pM’  (4) 

j=i  j=i 


where  the  base  functions  =  [1,  i]  for  m  =  2  and 
p^  =  [1,  X,  x^  for  m  =  3  in  one  dimension  [5]. 

The  above  approximation  can  now  be  sampled  at  n 
points  within  flj  where  the  vcJues  of  the  unknown 
=  u{xi)  are  sought,  i.e. 


P? 


n?  =  < 


T 

Pn 


>a  =  Ca 


(2) 


Figure  1:  Nodal  unknowns  u  and  the  interpolated 
function  u. 


Minimizing  J  with  respect  to  a,  ^  —  0,  yields 

a  =  A-^Bu'*  (5) 

with  A  =  (C^C)  and  B  =  C’’.  The  new  shape 
functions  are  now  obtained  as 

=  p’’A-'B  (6) 


where  C  is  a  nxm  matrix. 

3.1.1F'intte  element  interpretation 

If  n  =  m,  a  standard  finite  element  interpolation  is 
obtained  by  inverting  eq.  (2)  and  substituting  into 
(1)  as 

u{x)  =  p^C“^u^  =  N^’u'*  (3) 

N  being  the  standard  finite  element  shape  functions 

[7]. 


This  means  that  an  interpolated  curve  u(i)  is  gen¬ 
erated  from  some  point  values  Uj  in  each  cloud  cis 
shown  in  Figure  1.  Note  that  the  fitted  curve  does 
not  necessarUy  peiss  through  the  nodcd  unknowns  . 

Recently,  Batina  [4]  has  used  a  similar  type  of  least 
squares  fit  for  fluxes  and  stresses  in  compressible  flow 
analysis.  However,  he  avoids  the  direct  inversion  of 
matrix  A  by  doing  a  QR  decomposition. 

In  the  present  approach  the  danger  of  A  being  singu¬ 
lar  is  avoided  by  appropriately  selecting  the  points 
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in  the  interpolation  domain  Oj  .  This  reduces  both 
computational  cost  and  memory. 

S.l.S  Weighted  Least  Squares  Approach 

A  drawback  of  the  interpolation  procedure  so  far  pre¬ 
sented  is  that  equal  weight  is  given  to  all  the  points 
in  This  can  rapidly  cause  a  deterioration  of  the 
approximation  [6].  A  remedy  can  be  the  introduc¬ 
tion  of  weighting  functions,  such  as  a  Gauss  function, 
which  will  be  described  next. 

Following  the  least  squcires  approach  from  above,  we 
can  directly  include  the  weighting  functions  w{xj)  in 
eq.  (4): 


i=i  j=i 

(7) 

Again  minimising  J  with  respect  to  a,  we  obtain 


a  =  A-^Bu*  (8) 

with  A  =  w{xj)(C'^ C)  and  B  =  C^W.  W  is  now 
a  diagonal  matrix  containing  the  weights  w{xj)  at 
each  point  in 

In  [6],  the  authors  demonstrate  a  strong  sensitivity 
to  the  number  of  points  chosen  within  each  cloud  Oj 
if  no  weighting  functions  are  used.  In  an  example, 
the  shape  function  plots  show  a  drastic  deterioration 
for  both  line^lr  and  quadratic  base  functions  p. 

3.2  The  FPM  in  a  one  dimensional  context 

Let  us  now  apply  the  theoretical  background  to  a 
typical  test  problem,  the  linear  ID  convection  diffu¬ 
sion  equation,  and  compare  its  results  to  known  so¬ 
lutions  from  the  FEM.  Consider  the  ID  convection- 
diffusion  equation: 


du  du 


dx 


in  n 


(9) 


with  u  =  u{t,  x)  in  fl;  u{t,  0)  =  uq  in  Fq,  u(t,  L)  =  ul 
in  Fx,  and  F  =  Fo  U  F^. 


At  steady  state  =  0)  equation  (9)  becomes: 


du  5  /  du\ 
dx  \  dx) 


=  0 


(10) 


Taking  A  and  k  constant,  the  analytical  solution  of 
this  first  order  homogeneous  differential  equation  is 
obtained  as: 


1  —  e -* 

ti(a;)  =  tio  +  (wi  -  «o) - 5“  (11) 

1  —  e* 

With  uo  =  1  and  ul  =  0,  equation  (11)  reduces  to 

1  —  e^’® 

«(==)  =  !- V-V  (12) 

1  —  e» 

A  test  for  time  marching  schemes  is  solving  equation 
(9)  by  iterating  until  steady  state  is  reached  to  ap¬ 
proximate  the  exact  result.  This  is  usually  done  by 
expanding  equation  (9)  in  time  using  a  Taylor  series: 

00 

„«+i  =  u"  +  ^ 

«=i 

A  discretization  in  space  must  be  performed  next. 
First,  known  and  proven  finite  element  methods  will 
be  presented,  and  then  the  finite  point  method  pro¬ 
posed  will  be  described. 

S.2.1Ftnite  element  solution 

It  is  well  known  that  the  exact  solution  to  this  prob¬ 
lem  c^ln  be  nodally  reproduced  by  the  finite  element 
method  using  the  following  Petrov-Galerkin  meth¬ 
ods  for  all  ranges  of  the  Peclet  number  Pe  [7].  This 
can  be  achieved  by  expanding  equation  (13)  up  to 
first  order,  replacing  ^  with  equation  (9)  and  dis¬ 
cretizing  in  space  using  Petrov-Galerkin  shape  func¬ 
tions: 


At*  S*u 
a  dt* 


(13) 


W  =  (14) 

with  the  upwind  parameter  Uopt  =  coih{Pe)  —  ^ 
which  is  optimal  for  this  equation.  The  so  called 
Taylor-Gcderkin  approach  can  also  be  used  to  recover 
exact  nodal  values  for  this  problem.  In  fact,  if  equa¬ 
tion  (13)  is  expanded  up  to  second  order  (omitting 
third  order  derivatives)  and  standard  Gcilerkin  linear 
shape  functions  are  used,  equivalence  to  the  Petrov- 
Galerkin  scheme  can  be  proved  [7]  for 


h 

—  Atopt  — 

(15) 

^  r, 

and  Pe  =  ■— 

2k 

(16) 

with  u=u(x). 
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Figure  2  shows  the  exact  nodal  values  obtained  using 
a  Tayloi-Galerkin  two-step  scheme  with  Atopt  and 
Pe  =  1  [7,8]. 


■u(®)  =  ai  -f-  02®  =  p^a 


du 

dx 


=  0(2 


and  for  the  quadratic  case: 

u(®)  =  oi  -f-  02®  +  03®*  =  p*'o 


—  =  02  +  2o3® 
ax 


(19) 


(20) 


Figure  3:  Exact  solution  to  the  convection  diHusion 
equation  using  a  Taylor- Galerkin  scheme. 

3.2.2  The  finite  point  method  (FPM) 

Let  us  now  analyze  the  finite  point  method  in  the 
context  of  the  ID  convection-diffusion  equation.  In¬ 
tegrating  eq.  (9)  in  time,  performing  a  Taylor  ex¬ 
pansion  of  eq.  (13)  up  to  second  order  leads  to: 


For  the  linear  case,  it  is  not  possible  to  directly  com¬ 
pute  the  necessary  second  order  derivatives.  This 
can  be  overcome  by  performing  an  accumulation  of 
differences  at  the  central  point  and  the  rest  of  the 
points  j  within  the  cloud.  Hence, 

(-) 

i=2 


„n+l 


_  .  .du 

=  u^  +  At—  + 


At*  d^u 
2  at* 


(17) 


inserting  eq.  (9)  into  (17)  and  omitting  third  order 
derivatives,  we  obtain: 


with  point  1  being  the  central  point. 

It  can  be  shown  that  this  scheme,  for  equally  spaced 
points  (n  =  3  and  m  =  2, 3),  is  equivcJent  to  central- 
differences  which  in  turn  is  equivalent  to  FEM  with 
Galerkin  shape  functions  [8]. 


yii+i  =  u»_At 


^*At  a*iil 


2  a®*  J 
(is; 


The  discretization  of  the  computational  domain  is 
performed  locally  using  arbitrary  points,  without  the 
need  for  fixed  connectivities  in  a  conventional  mesh. 
Performing  a  least  squares  approximation  in  the  vi¬ 
cinity  of  a  point  using  using  n  points,  we  obtain 
an  estimation  of  the  necessary  spatial  derivatives  ^ 
and  Substituting  these  derivatives  into  eq.  (18) 
leads  to  a  system  of  equations&om  which  the  un¬ 
known  point  values  u{j  can  be  found  for  each  time 
increment.  The  approach  is  equivalent  to  using  a 
point  collocation  scheme  [5]. 

As  explained  earlier,  the  unknown  functions  u(z) 
and  its  derivatives  may  be  expanded  as  follows  in 
a  given  cloud,  for  the  linear  ceise  (in  what  follows  we 
assume  u(z)  =  u(®)): 


h 

- ►- 

Figure  Z:  Equally  spaced  points  and  their  domain  of 
influence  for  n  =  3.  Note  the  derivative  is 
equivalent  to  a  central  difference  approxi¬ 
mation. 

The  following  shows  this  for  the  linear  case  (m=2). 

Consider  three  points  (1,2,3)  with  the  coordinates 
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(zi,X2,a;3)  and  at  equal  distance  h  from  each  other 
as  shown  in  Figure  3.  Let  their  unknown  values  be 
“ii  «2»  Usi  respectively.  Then,  A  firom  equation  (5) 
can  be  calculated  as 


r  1 

A  -  " 


Inversion  of  A  leads  to 


3  3zi 
3zi  3*1  +  2h* 
(22) 


,  r  1  j-  ^  ». 

A“^  =  3 

■  ~2h^  2hJ  ■ 


Eq.  (5)  gives  the  polynomial  coefficients  ai  and  aj 


“‘1  =  A-^l 
,«2. 


The  first  derivatives  in  the  linear  Ccise  are  constant. 
Using  eq.  (19),  (23)  and  (24)  they  become  now: 


dx  ^  2/i2 


,  ^  Zl  Zl  +  U3  -  U2 


which  is  exactly  a  central  difference  approximation. 
The  second  differences  are  taken  as  accumulated 
differences  at  the  centr^  node  1,  which  leads  to 


d^u 

dx^ 


U2-U1 

h2 


ti3  -  Ui 


U2  —  2tii  +  u3 


(26) 


which  also  is  a  centred  difference  approximation.  By 
analogy,  we  can  derive  similar  statements  for  the 
quadratic  case  (m  =  3). 

Having  shown  the  equivalence  of  FPM  (n  =  3,  m  = 
2,  3)  with  FEM,  it  should  be  possible  to  recover  no- 
dally  excact  values  for  n  =  3  and  m  =  2, 3  using  the 
finite  point  method.  Figure  4  demonstrates  this  for 
Pe=  1. 


Figure  4:  Exact  nodal  values  obtained  by  FPM  for 
Pe  =  1,  n  =  3  and  m  =  2,3. 

However,  if  the  number  of  points  in  the  local  inter¬ 
polating  domain  0^  increases  (n  >  3),  the  algorithm 
introduces  excessive  diffusion  and  the  quality  of  the 
result  deteriorates,  especially  near  strong  gradients. 


Figure  5:  Gauss  weighting  function;  w(rj)  quickly 
decreases  as  rj  increases  (for  Aic  =1). 

Z. 2. Z Introduction  of  weighting  functions 

It  is  now  of  interest  to  see  if  the  results  can  be  im¬ 
proved  by  the  use  of  weighting  functions  wj  within 
flj.  The  idea  is  to  give  additional  weight  to  points 
close  to  the  central  point  and  reduce  it  for  points  far¬ 
ther  away.  Within  each  flj  we  define  wj  as  a  function 
of  the  distance  of  each  point  j  to  the  central  point: 

Wj  =  w(rj),  and  Vj  =  \xj  -  Xi\  (27) 

A  possible  choice  for  weighting  could  be  a  Gauss 
distribution  which  was  also  used  in  all  following  cal¬ 
culations: 
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w{rj)  =  e  (28) 

where  Aj  is  a  characteristic  length  in  each  cloud  fit. 
c  and  p  are  user  defined  constants  to  adjust  the  sen¬ 
sitivity  of  the  weighting  function.  Usually,  c  =  1  and 
p  =  2  are  chosen.  Figure  5  displays  )  graphically 
for  AjC  =  1.  Further  information  on  the  FPM  can 
be  found  in  [5,6]. 


Figure  6:  Convection  diffusion  equation  using  linear 

base  functions  and  3  points  per  cloud  for 
a)  Pe  =  0.5,  b)  Pe  =  1  and  c)  Pt  =  2.5. 

4.  NUMERICAL  EXAMPLES 

4.1  ID  Convection  diffusion  equation 


Let  us  test  the  solution  of  equation  (18)  using  the 
FPM  without  weighting  and  with  Gaussian  weight¬ 
ing  functions. 


Figure  7:  Convection  diffusion  equation  using  linear 

base  functions  and  4  nodes  per  cloud  for 
a)  Pe  =  0.5,  b)  Pe  =  1  and  c)  Pe  =  2.5. 

Fig.  6  shows  the  FP  solution  using  linear  base  func¬ 
tions  (m  =  2)  and  3  points  per  cloud  for  three  dif¬ 
ferent  Peclet  numbers:  a)  Pe  =  0.5  b)  Pe  =  1.0  and 
c)  Pe  =  2.5.  Observe  that  exact  nodal  values  are 
obtained  in  all  cases. 

However,  as  the  number  of  nodes  in  the  cloud  is 
increased,  a  deterioration  of  the  solution  is  visible 
when  no  weighting  functions  are  employed.  Figures  7 
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and  8  demonstiate  this  behavior  for  n  =  4  and  n  =  5. 

Using  Gaussian  weighting  functions,  the  improve¬ 
ment  of  the  solution  is  impressive.  With  A  >  — 
where  rmin  lefers  to  the  minimum  distance  of  r  in 
Oj,  pr£u:tically  exact  nodal  values  are  recovered  for 
this  ID  test  problem  (see  Figures  7  and  8). 


Figure  8:  Convection  diffusion  equation  using  linear 
base  functions  and  5  nodes  per  cloud  for 
a)  Pe  =  0.6,  b)  Pe  =  1  and  c)  Pe  =  2.6. 

The  extension  to  quadratic  base  functions  (m  =  3) 
exhibits  a  more  drastic  need  for  using  of  weighting 
functions.  Whereas  with  3  noded  clouds  (n  =  3)  ex- 
ttct  nodal  VcJues  are  computed  (Figure  9),  strong  os¬ 
cillations  occur  as  n  is  increased  (Figure  10).  These 


oscillations  disappear  if  a  weighting  interpolation  is 
used. 


Figure  9:  Convection  diffusion  equation:  quadratic 
base  functions  and  3  nodes  per  cloud  for 
a)  Pe  =  0.6,  b)  Pe  =  1  and  c)  Pe  =  2.6. 

Additionally,  we  have  also  found  that  the  quality  of 
the  results  worsens  as  the  Peclet  number  increases 
if  no  weighting  functions  are  employed.  Note  that 
again  nearly  exact  noded  solutions  are  recoverd  by 
employing  Gaussian  weighted  interpolation. 

4.2  Extension  to  the  2D  Euler  equations 
A.2,1  Governing  equations 
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The  ideas  (lom  the  one  dimensional  problem  ate  ex¬ 
tended  to  the  solutions  of  the  non  linear  two  dimen¬ 
sional  Euler  equations: 

W  +  S  =  «  i=.,2  W 

where 


f  p  ] 

r  p^k 

I 

I  P^2 

•  and  fjt  =  < 

1  pukUi+pSik 

1  pttfctij  +p^ik 

1  pc  J 

[  {pe  +  p)uk  - 

The  different  terms  have  the  usual  meaning  [8]. 


Figure  10:  Convection  diffusion  equation  using  qua¬ 
dratic  base  functions  and  5  points  per 
cloud  for  Pe  =  1. 

4.2.2  Time  discretization 

A  two-step  scheme  is  employed  in  order  to  advance 
the  solution  in  time  towards  steady  state,  i.e. 


At  dfk 
2  dxk 


=  u**  —  At 


dxk 


~  u 


At 


dxk 


(6(u"+S)) 

(30) 


^.2.3  stability 

The  two-step  scheme  leads  to  a  conditionally  stable 
explicit  second  order  algorithm  of  with  the  following 
limit  for  At: 


At  =  and  C  <  1  (31) 

|v|  +  c 

where  C  is  the  Courant  number. 

A  difficulty  in  a  multidimensional  context  arises  &om 
the  determination  of  h  in  a  given  cloud  of  points  Q,-. 
In  finite  elements,  using  linear  triangular  elements, 
h  is  defined  according  to  the  minimum  height  within 
each  element  [8].  In  meshless  methods,  a  clear  defi¬ 
nition  hiis  not  been  presented  yet.  In  our  work,  h  has 
been  taken  equal  to  Amtm  this  being  the  minimum 
distance  to  the  center  point  within  each  interpola¬ 
tion  domain  0,-. 

4.2.4BaIancing  dissipation 

Since  the  hyperbolic  Euler  equations  do  not  contain 
any  diffusion  terms,  some  balancing  damping  must 
be  added  to  prevent  unphysical  oscillations.  Follow¬ 
ing  Jameson  [9],  2nd  and  4th  order  diffusion  terms 
are  added  to  the  fluxes.  These  terms  are  constructed 
as  follows  in  the  FPM: 


di  =  -(|v|  -I-  c)  ^(e,-  - 

Ci  =  ef  ^u,-  -  -  Uf)  (32) 

where  Wj  are  the  same  weighting  functions  used  in 
the  interpolations  of  eq.  (8)  and  the  coefficients  of 
eq.  (32)  are  obtained  as: 


P«l  . 

^  E;=3Pf+  ’ 


max(0,a^^^-c^*^) 

(33) 


a(2)  and  are  user  defined  constants.  The  sum¬ 
mation  j  extends  accross  the  number  of  points  in 
each  cloud  and  is  accumulated  at  both  the  central 
point  i  and  the  point  j.  In  subcritical  flows  is 
generally  switched  off. 


4.2.3Selection  of  points 

In  a  multidimensional  domain,  the  difficulty  arises 
on  how  to  define  each  locrd  interpolating  domain. 
Even  though  weighting  functions  are  employed,  it  is 
still  necessary  to  choose  the  most  significant  points 
for  each  fli.  For  the  results  of  this  paper,  the  central 
point  plus  the  n  —  1  closest  points  me  chosen.  How¬ 
ever,  a  condition  of  quadrants  is  imposed  such  that 


theie  must  be  at  least  one  point  in  every  quadrant 
of  orthogonal  axes.  This  leads  to  a  minimum  of  5 
points  per  cloud. 

At  the  boundary,  the  two  points  adjacent  to  the  cen¬ 
tral  point  on  the  boundary  plus  the  closest  points  are 
chosen.  Another  condition  is  that  no  boundary  sec¬ 
tion  is  crossed  so  that  points  &om  the  opposite  side 
are  not  chosen.  For  instance,  at  the  trailing  edge  of 
an  airfoil,  the  closest  points  to  a  point  on  one  side  of 
the  airfoil  may  lie  accross  the  wall  on  the  other  side. 
Since  there  is  a  physical  separation  of  these  points, 
they  are  not  included  in  the  same  interpolation  do¬ 
main. 

4.3  2D  Results 
4.3.1  Subsonic  test  case 

The  first  2D  test  case  considered  is  a  NACA0012 
profile  with  a  free  stream  Mach  number  of  0.5  and  0 
degrees  angle  of  attack,  aneilyzed  by  Zienkiewicz  et 
al  [10,11].  In  order  to  compare  solutions,  a  finite  el¬ 
ement  solution  has  been  taken  for  comparison.  The 
meshless  grid  of  2556  points  is  shown  in  Figure  11. 
The  same  points  have  been  used  for  the  FE  solution 
on  the  equivalent  unstructured  triangular  mesh  ob¬ 
tained  using  a  standard  advancing  front  technique 
[7,8]. 


Figure  11:  Point  distribution  tu'ound  a  NACA0012 
profile 

Again,  the  idea  is  to  compare  the  influence  of  the 
weighting  functions  in  the  finite  point  approxima¬ 
tion.  In  previous  reports  [5,6]  we  have  shown  re¬ 
sults  proving  the  superiority  of  weighting  functions 
in  a  2D  context,  but  without  using  weighting  func¬ 
tions  for  the  balancing  diffusion  terms  (see  eq.  (32)). 
Here,  the  benefit  of  the  weighted  diffusion  terms 
shall  be  presented. 


The  results  of  the  FPM  were  obtained  by  employing 
7  nodes  in  Dj,  A  =  Amini  c  =  1  and  linear  base  func¬ 
tions  (m  =  3).  A  globed  comparison  of  the  meshless 
solution  is  shown  in  Figure  12.  In  a),  b),  c)  and  d) 
the  mesh,  the  Taylor-Galerkin  solution,  a  four-stage 
Runge-Kutta  Galerkin  result  and  the  FPM  solution 
for  the  density  are  presented,  respectively.  Quali¬ 
tatively,  all  results  are  very  similar.  In  Figure  13 
close-up  views  in  the  stagnation  area  enhance  the 
comparison  of  density  contours  of  a)  FPM  without 
weighted  diffusion,  b)  FPM  with  full  Gauss  weight¬ 
ing,  c)  RK-Galerkin  and  d)  Taylor  Gcderkin.  Note 
the  improvement  of  solution  b)  with  respect  to  a), 
not  exhibiting  any  oscillations  in  the  stagnation  area 
through  the  use  of  weighted  diffusion  terms. 

In  Figure  14,  a)  velocity  contours  and  b)  velocity 
vectors  in  the  stagnation  zone  are  displayed,  respec¬ 
tively. 

4.3. 2 Supersonic  test  case 

The  second  2D  test  case  is  a  hypersonic  inviscid  flow 
of  Mach  8.15  around  a  double  ellipse,  which  is  well 
documented  by  the  proceedings  of  the  workshop  in 
Antibes,  1991  [12].  The  flow  enters  at  an  angle  of  30 
degrees.  The  solution  is  characterized  by  a  strong 
primary  bow  shock  and  a  weaker  canopy  shock. 

To  solve  this  problem,  a  grid  of  approximately  11000 
points  was  generated  using  again  the  advancing  front 
technique.  Linear  base  functions  (m=3)  and  6  point 
clouds  with  Gaussian  weighting  were  used.  The  re¬ 
siduals  of  the  solution  have  been  reduced  to  six  or¬ 
ders  of  magnitude.  Figures  15  a),  b)  and  c)  present 
the  meshless  grid,  Mach  number  contours  and  den¬ 
sity  lines,  respectively.  Note  that  the  solution  is  very 
smooth  and  the  location  of  the  shock  is  well  cap¬ 
tured.  The  numerical  overshoot  of  about  3%  in  Mach 
number  is  within  reasonable  limits  and  it  could  be 
improved  by  increasing  the  balancing  diffusion.  The 
convergence  of  this  solution  was  slow  due  to  a  low 
Courant  number  of  0.25  (avoiding  neg.  pressures). 

Figure  16  a)  demonstrates  the  high  quality  of  the  so¬ 
lution  in  the  vicinity  of  the  stagnation  area  showing 
no  oscillations.  Figure  16  b)  displays  the  pressure 
coefficient  Cp  on  the  boundary  of  the  double  ellipse 
which  compares  well  to  other  contributors  [12]. 
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Figure  13:  NACA0012  profile:  a)  Mesh,  b)  TG  solution,  c)  RK  solution  and  d)  FPM  solution  are  shown. 
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Figure  IS:  NACA0012  profile;  Close  up  density  contours  in  the  stagnation  area  for  a)  FPM  without  weighted  balancing 
diilusion  ,  b)  FPM  with  weighted  balancing  difiusion,  c)  RK  solution  and  d)  TG  solution. 


Figura  14:  NACA0012  profile:  a)  velocity  contours  and  b)  velocity  vectors  in  the  stagnation  sone  are  shown  for  the  FPM 
with  Gaussian  weighted  balancing  difiusion  terms. 
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1.  ABSTRACT 

Solution  algorithms  for  solving  the  unsteady  2D  Euler 
equations  are  presented.  Cell-centered  upwind  control 
volume  scheme  are  developed  with  utilize  the  two- 
dimensional  monotone  linear  reconstmction 
procedures.  The  new  adaptive  grid  procedure  are 
proposed  to  cluster  the  grid  points  in  regions  where 
they  are  most  needed.  This  procedure  is  generalized 
for  unstructured  grids.  Numerical  results  in  two- 
dimensional  case  are  presented  for  linear  and 
nonlinear  convection  problems. 

2.  INTRODUCTION 

The  numerical  simulation  of  many  gas  dynamics 
processes,  possessing  apphed  significance,  requires  the 
solution  unsteady  two-dimensional  Euler  equations  in 
the  complex  geometry  region.  The  typical  feature  of 
inviscid  gas  flow  about  bodies,  in  channel  of  the 
complex  form  or  in  jets  is  presence  interacted  shock 
waves  and  other  gas  dynamics  discontinuities  [1-3]. 
For  computation  of  such  flows  the  high  order  schemes 
of  TVD  or  ENO  type  are  obtained  wide  spreading. 
These  schemes  have  high  order  of  accuracy  in  the 
region  of  the  smooth  solution,  well  capture 
discontinuities  and  preserve  monotonicity  of  the 
solution.  In  the  present  paper  for  the  solution  of  Euler 
equations  high  order  version  of  Godunov's  scheme 
[4,5]  is  used. 

To  improve  efficiency  of  codes  based  on  TVD  and 
ENO  methods  and  to  resolve  local  features  of  a  flow 
the  solution-adaptive  grid  algorithms  can  be  used.  The 
authors  have  developed  an  adaptive  grid  algorithm 
suitable  for  structured  and  unstructured  grids.  It  is 
based  on  the  algebraic  minimal  moments  scheme  by 
Connett,  etal.  [6,7]  with  cell-centered  grid 
modifications. 

The  proposed  method  belongs  to  the  class  of  moving 
grid  methods.  Using  these  methods  for  structured 
grids,  strongly  skew  cells  can  be  obtained  near  large 
gradient  regions.  In  this  case,  ID-  procedure  along 
gridlines  can  yield  large  error  due  to  decreasing  of 
order  of  approximation.  So  it  is  necessary  to  use 
essentially  2D  reconstruction  procedures.  Note  that 
only  2D  reconstruction  procedures  on  unstructured 
grids  can  be  used. 


cells  needed  to  determine  the  coefficients  of  the 
polynomial). 

Numerical  results  are  presented  in  Section  4  to 
illustrate  the  capabihty  of  the  proposed  algorithms. 

3.  GOVERNING  EQUATIONS 

The  governing  equations  are  the  conservation  form  of 
the  Euler  equations  for  two-dimensional,  unsteady, 
compressible  flows  of  a  calorically  perfect  gas 

q,+F{q)^+G{q)^  =  S  (1) 

where 


p 

pu 

pv 

pu 

,  F{q)  = 

pu^  +  p 

,  Giq)  = 

puv 

2  , 

pv 

puv 

pv  +  p 

_E 

{E  -1-  p)u 

_{E  +  p)v_ 

Here  p,  p  and  E  are  the  density,  pressure  and  total 
energy,  respectively,  and  u  and  v  are  the  Cartesian 
components  of  the  velocity  vector.  S  is  the  source 
term.  The  system  (l)-(2)  of  four  equations  is  closed 
with  the  polytropic  equation  of  state 

p  =  (r-l)(^-p/2(«Hv2)),  (3) 

where  y  is  the  ratio  of  specific  heats. 

2.  CLASS  OF  HIGH  ORDER  SCHEME  FOR 
NUMERICAL  SIMULATION  GAS  DYNAMIC 
FLOWS 


2.1  Finite  volume  formulation 

Let  denote  by  a  rectangular  partition  of  the  x-y 
plane,  where 

with  (Xjvjy)  denoting  centroid  of  each  rectangle  A--. 
With  help  of  integral  form  of  equation  (1)  for  each 
control  volume  Sy  following  equation  can  be  obtained 


^■(0 

a 


In  present  paper  the  linear  reconstmction  procedures 
are  considered  and  one  such  procedure  is  developed. 
It  is  based  on  the  well-known  in  Russia  2D  algorithm 
by  Tillyaeva  [8]  with  modification  which  taking  into 
account  a  more  wide  additional  support  (the  set  of 


where  =  AxAy;  is  area  of  S^p  and 
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^■(0  =  —  /  \q{x,y,t)dxdy  (5) 

*/-i/2  yj-1/2 

is  the  cell  average^  of  q  over  the  control  volume  at 
time  t.  The  fluxes  /  and  g  are  given  by 

VAI/2 

fuii2j(^)^  jfiqi^M/2:y>f))dy,  (6) 

yj-i/2 

gM/2,j(0=  jgiq(x,yj^i/2yf))dx.  (7) 

^1-1/2 

Equation  (4)  can  be  treated  as  a  system  of  ordinary 
differential  equations.  Along  any  /=constant  line,  the 
right-hand  side  of  (4)  is  a  spatial  operation  in  q,  and 
3ve  rewrite  this  equation  in  the  abstract  operator  form 

■^qij{t)  =  (Lq{t)\j  (8) 

at 

for  the  purpose  of  "separation"  the  spatial  and 
temporal  discretizations. 

2 1. 1  Spatial  discretization 

To  achieve  desired  order  of  accuracy  we  replace  the 
operator  L  with  a  discrete  spatial  operator  L,  which 
approximates  L  to  r-1  order 

Lq{t)  =  Lq{t)  +  0{h^).  (9) 

We  define  L  explicitly  by 

{Lqlt)),j  =  -  l_y2j{t) 

Oij 

+  1+1/2  (0  “  Si,j-l/2  (^)]  > 

where  /  and  are  approximations  of  corresponding 
order  to  fluxes  /  and  g  in  (6)  and  (7). 

For  approximation  of  integrals  (6),  (7)  we  can  use 
"classical"  A'-point  Gaussian  quadrature.  Therefore,  for 
fixed  X  and  /,  and  sufficiently  smooth  f,  the 

approximation  of  the  flux  integral  (6)  by  Gaussian 
quadrature  satisfies 

'*■  Av  ■ 

fuiiijit)  =  -^Y.^kfi.qix;^x/2,yk,t)) 

^  k=\ 

+  s{Xi^i/2,'n)h^^^\  (11) 

where  function  ^  relates  to  the  quadrature  error  and 
V  e  (>’i-i/2jJ'y+i/2)' 

Let  A  be  a  spatial  operator  which  reconstructs  the  set 
of  cell  average  and  yields  a  2D,  piecewise  polynomial 
qipc,y)  of  degree  r-1  which  approximates  q(x,y,t),  with 
a  tmncation  error  of  0(h'') 

q^{x,y)  =  R{x,y,q{t)).  (12) 

Therefore,  if  we  define  "abstract"  numerical  flux  f  in 

(10)  by 


Av  •  S-y 

y;+l/2j(0  ~  iqh{Xi+y2iyk))  ^  (13) 

^  k^X 

Analysis  (11),  (13)  and  the  flux  difference 
fi+ii2ji^)  ~  (10)  shows  [9],  that  number  of 

Gauss  points  K  must  satisfied  condition  r<2K.  Then 
the  error  relation  satisfies 

/i+l/2j(0  -  ii'-l/2j(l) 

=  Ai/2j(0-/-i/2j(0  +  O(/i-2).  (14) 

Noting  that  the  area  is  0(h^,  than  upon 

substitution  of  the  numerical  fluxes  (13)  into  (10)  we 
have  thus  designed  the  spatial  operator  L  that  satisfied 
(9). 

We  now  wish  to  modify  the  "abstract"  numerical  flux 
(13)  such  that  conditions  of  approximation  of  desired 
order  for  scheme  be  satisfied  in  regions  where  the 
solution  is  smooth  and,  in  addition,  these  fluxes  will 
account  for  possible  discontinuities  in  q.  This 
modification  follows  naturally  from  the  reconstruction 
procedure,  by  which  the  function  qf^(x,y),  in  (12),  can 
discontinuous  at  cell  interfaces,  presentation  of  qf^ixj) 
within  a  cell  Sy.  In  order  to  resolve  these 
discontinuities,  the  flux  integrands  in  (13)  are  replaced 
by 

(qy  (xux/2  ,yk),  quxj  (xm/2  >yk)), 

(qy(Xk,yui/2)>qi,ux(Xk,yj+x/2)),  (i5) 

where  ?y(x,y)  denote  the  local  representation  of 
qh(x,y)  within  a  cell  Sy  and /^{qj,q 2)  denotes  the  flux, 
across  x=0,  associated  with  the  solution  to  the 
Riemann  problem  whose  initial  states  are  qj  and  ^2- 

2.1.2  Temporal  discretization 

Equation  (4)  is  discretized  by  using  a  Runge-Kutta 
method  (R-K)  of  Shu  [10]: 

#  =  /=l,2,...,p, 

=  {Lq)f ,  qP  =  qf ,  qjj^^  =  qf^  •  (16) 

The  order  of  accuracy,  as  well  as  its  TVD  properties, 
is  achieved  by  adequate  sets  of  coefficients  Pi^ 
and  p  [lOj. 

Also  in  our  code  method  [11]  is  used.  It  is  predictor- 
corrector  type  method  of  second  order  accuracy,  in 
which  on  first  stage  the  fluxes  are  calculated  without 
solution  of  the  Riemann  problem  and  reconstruction 
procedure  is  performed  only  on  first  stage. 

2 1.3  Riemann  solvers 

Using  of  the  solution  of  the  Riemann  Problem  (RP) 
for  calculation  of  the  fluxes  over  faces  of  a  control 
volume  (16)  allows  to  take  into  account  local 
directions  of  perturbation  propagation.  In  general  case 
it  is  necessary  to  use  really  two  dimensional  Riemann 
Solver  for  accurate  calculation  of  2D  flows.  But 
recently  proposed  2D  Riemann  Solvers  [12^13]  are  too 
complicated,  tedious  and  sometime  result  in 
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instability,  even  for  the  first  order  schemes.  So  in 
presented  paper  the  solution  of  RP  is  determined  by 
ID  Riemann  Solvers.  According  to  common  practice, 
which  start  from  works  of  S.K.  Godunov  [5],  the 
normal  to  face  of  a  control  volume  is  used  as  the 
direction,  along  which  the  ID  RP  is  solved.  It  can 
result  in  dependence  of  the  solution  quality  on 
numerical  grid. 

In  developed  computer  code  [14]  there  is  opportunity 
to  choice  of  the  ID  Riemann  Solver  (for  example 
exact  solution  [4]  or  approximate  methods  by  Roe 
[15],  Osher  [16],  Dukowicz  [17],  Davis  [18]  and 
others  [19])  for  the  solution  of  the  RP. 


3.  ADAPTATION  PROCEDURE 

The  adaptive  grid  algorithm  presented  here  is  based  on 
the  algebraic  minimal  moment  method  [6,7]  with  cell- 
centered  grid  modifications.  For  exact  reconstruction 
of  a  linear  function  the  centroid  of  control  volume, 
where  averaged  function  values  are  stored,  has  to 
coincide  with  the  gravity  center  of  this  control 
volume.  So  in  presented  algorithm  firstly  vertices  of 
control  volumes  are  replaced  and  then  coordinates  of 
gravity  centers  of  new  control  volumes  are  evaluated. 

An  analytical  expression  for  the  movement  of  cell 
vertex  P  is  given  as 


2.1.4.  Boundary  conditions 

Correct  setting  of  numerical  boundary  conditions  is 
one  of  the  most  important  items  for  the  numerical 
simulation  of  unsteady  gas  dynamics  flows.  Numerical 
boundary  conditions  based  on  using  of  characteristic 
relations  are  most  correct  in  physical  sense  and  they 
are  implemented  in  the  present  work.  For  the  open 
boundaries  the  "non-reflecting"  boundary  conditions 
[20]  are  used. 

2.2.  Two  dimensional  reconstruction  algorithms 

Using  solution-adaptive  grid  methods  for  structured 
grids,  strongly  skew  cells  can  be  obtained  near  large 
gradient  regions.  In  this  case,  ID-  procedure  along 
gridlines  can  yield  large  error  due  to  decreasing  of 
order  of  approximation.  So  it  is  necessary  to  use 
essentially  2D  reconstruction  procedures 
[8,9,21,22,25,26].  Most  of  these  procedures  are  rather 
complicated  and  very  costly.  At  present  paper  most 
simple  linear  reconstruction  procedures  are  used 
[8,22].  They  can  eflficiently  realized  on  the  structured 
grids. 

In  the  reconstruction  procedure  by  Tillyaeva  [8] 
(denoting  below  as  TL)  the  five -point  stencil  is 
decomposed  into  triangles.  The  slopes  of  planes 
passing  through  function  values  at  the  vertices  of  each 
of  the  four  triangles  with  a  vertex  at  ij  are 
determined.  The  derivatives  with  respect  to  x  and  y  of 
linear  reconstructed  function  are  evaluated  from 
corresponding  slopes  by  minmod  operator.  As  shown 
in  [8]  for  case  of  stmctured  quadrilateral  grids  only 
two  opposite  triangles  with  a  vertex  at  iJ  can  be  used 
in  the  reconstruction  procedure.  Moreover,  on 
rectangular  grids  the  reconstruction  became  couple  of 
the  independed  ID  minmod  limiters. 

As  follow  from  numerical  results  TL  reconstruction  is 
too  much  diffusive.  So  we  modify  it  by  taking  into 
account  slopes  of  central  plane,  reconstructed  with 
algorithm  similar  [21]  over  all  points  of  the  stencil. 
We  denote  this  algorithm  with  MT. 

Last  algorithm  that  studied  in  this  paper  is  the  linear 
reconstruction  proposed  in  [22].  Initially  the  estimate 
of  solution  gradient  in  the  cell  are  computed  using  the 
approximation  of  boundary  integral  for  some  path 
surrounding  this  cell.  Then  the  obtained  gradient 
estimation  is  restricted  to  satisfy  the  monotonicity 
principle. 


=  (17) 

^  -  means_  summation  over  all  neighbour  cells  with 
a  vertex  at  F .  5  is  the  vector  location  of  the  gravity 
center  of  corresponding  cell. 

The  "mass"  of  cell  n  is  defined  as 


=  maxj 


fn-fj\  , 


c 


(18) 


where  j  is  index  of  adjusting  with  n  cells,  As^  is  the 
cell  area  and 


M  =  (19) 

nb 

The  first  term  within  the  parenthesis  of  equation  (18) 
represents  an  estimate  of  the  maximum  gradient  of  an 
arbitrary  function,  /  on  a  grid  cell.  The  adaptation  will 
be  sensitive  to  the  gradient  of  this  function.  The 
constant  c  is  the  user  specified  constant  which 
controls  the  adaptation  strength. 

For  boundary  point  adjustment  two  algorithms 
proposed  in  [6,23]  are  used. 

In  the  method  of  minimal  moments  [6,7]  the  nine- 
point  stencil  is  decomposed  into  triangles.  The  gravity 
center  for  each  of  the  four  triangles  with  a  vertex  at  iJ 
is  determined.  A  weighting  function  value  is  computed 
for  these  points.  The  new  location  of  point  ij  is  at 
common  center  of  mass  of  these  four  triangles  and  is 
given  by  (18)  with  summation  over  these  triangles.  In 
the  presented  method  "natural"  information  is  mainly 
used:  gravity  centers  and  areas  of  computation  cells, 
that  are  computed  and  stored  for  a  flow  field 
computation.  In  some  cases  an  estimate  of  the 
maximum  gradient  in  the  weighting  function  (18)  can 
be  obtained  as  auxiliary  result  of  a  reconstruction 
procedure.  It  can  be  rather  important  for  overall 
efficiency  of  flow  solver  with  dynamic  solution- 
adaptive  grid  techniques. 

The  presented  method  can  be  easy  generalized  on 
unstructured  cell  centered  grids.  The  illustrations  in 
Fig.  1  were  produced  to  show  how  presented 
algorithm  can  be  used  on  structured  quadrilateral  and 
unstructured  triangular  grids.  Fig.  la  presents  surface 
plot  of  given  function  H(x,y)=tanh(3(x-y^)) 
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+taiili(3(x2+y2-l)).  Figures  lb  and  Ic  contain  the 
solution-adaptive  grids  produced  by  our  algorithm. 

4.  NUMERICAL  RESULTS 

4.1.  Linear  scalar  2D  problems 

Numerical  dissipation  of  explicit  high  resolution 
schemes  with  various  "reconstruction"  algorithm  were 
compared  by  numerical  results  for  the  two 
dimensional  advection  equation  [24] 

M,  +  (aiy)u)^  -r  =  0  (20) 

with  aiy)  =  -{y-yo)a>,  /Xy)  =  -(x-xa)a)  (20) 

The  exact  solution  of  (20),  (20)  consist  in  the  rotation 
of  the  initial  values  round  (xj.yo)  with  angular  velocity 
0.  In  this  paper  presented  two  series  of  calculations. 
As  initial  values  was  chosen  a  cut-out  cylinder  and  a 
cone  (fig.  2).  We  used  the  angular  velocity  o  to  be  0.1 
and  Xg=50,  yg=50.  The  region  of  computation  was 
[0, 100]  X  [0,100].  The  numerical  calculation  were  done 
on  three  type  of  grids  with  100  grid  points  in  each 
direction.  The  first  type  is  uniform  rectangular  grid, 
the  second  type  is  smooth  curvilinear  grid  (Fig.  3.a) 
described  by  transformation 

where  =  Ax{i  - 1),  Ax  =  1, 

^/=Ay(y-l),  Ay  =  l. 

And  the  third  type  is  random  grid  (Fig.  3.b) 

Xii  =  ^i  +  s^-Ax 
y..  =  T}j+mgAy 

where  Sy  and  0^  are  uniformly  distributed  random 
numbers  on  (-0.4,  0.4). 

At  time  i=20a:  the  initial  values  have  carried  out  one 
full  rotation  and  returned  to  their  initial  position.  The 
approximations  of  the  initial  values  on  uniform  grid 
are  shown  in  Fig.  2.  To  improve  picture  resolution  in 
Figs.  2  and  4  we  used  only  part  of  the  computation 
region  [50,100]x|25,75].  Size  and  initial  position  of  the 
cut-out  cylinder  and  the  cone  are  same  as  in  [24] .  We 
perform  long  time  calculations  until  t—V107t  which 
corresponds  to  six;  full  rotations  of  initial  values.  As 
mentioned  in  [24]  these  problems  are  well  suited  to 
benchmark  the  numerical  properties  of  the  schemes. 

In  this  paper  three  reconstruction  procedures 
described  in  section  2.2  are  compared.  Numerical 
results  obtained  with  these  reconstructions  on 
computational  grids  of  three  types  are  presented  in 
Tab.  1  for  the  cone  and  in  Tab.  2  for  the  cut-out 
cylinder.  These  tables  contain  numerical  solution 
errors  calculated  with  respect  to  the  norm  and 
maximum  values  of  obtained  solutions.  Fig.  4  shows 
numerical  solutions  computed  using  uniform 
rectangular  grid.  TL  reconstruction  results  are  shown 
in  Figs.  4a  and  4d,  MT  results  -  Figs.  4b  and  4e  and 


BJ  results  -  Figs.  4c  and  4f  As  we  expect  the  TL 
reconstruction  procedure  is  most  dissipative.  MT  and 
BJ  results  are  rather  close,  but  in  MT  case  errors  are  a 
little  smaller  and  maximum  values  are  a  httle  greater. 
These  problems  also  shows  that  the  MT 
reconstmction  may  not  preserve  the  symmetry.  We 
have  some  difficulties  with  implementation  of  the  BJ 
reconstruction  on  the  curvilinear  grids.  So  results  for 
this  grid  aren't  presented  in  Tabs.  1  and  2. 

4.2.  A  channel  with  a  15“  compression -expansion 
ramp 

The  next  case  is  the  flow  through  a  duct  with 
compression-expansion  ramp  in  the  bottom  wall.  The 
conditions  for  this  case  are;  M^—2,  y^lA.  The 
computational  grid  is  equally  spaced  and  contains  180 
cells  in  the  streamwise  direction  and  60  cells  in  the 
cross  flow  direction. 

The  computed  Mach  contours  are  presented  in  Fig.  4. 
Note  that  the  induced  and  reflected  shocks  are  quite 
thin  and  any  unphysical  oscillations  are  absent.  All  of 
characteristic  features  of  the  flow  are  well  resolved  on 
such  fine  grid  without  adaptation. 

4.3.  A  Oblique  Shock-reflection  problem 

One  of  the  most  popular  problems  for  checking  out 
various  elements  of  numerical  algorithm  (such  as 
reconstruction,  adaptation  etc.)  is  the  regular 
reflection  of  an  oblique  shock  wave  by  a  flat  plate.  In 
Figs.  5  a-f,  results  are  shown  for  a  case  with  —2.9 
and  /?  =  29" ,  where  p  is  the  angle  made  by  incident 
shock  wave  and  the  flat  plate  Fig.  5a.  First  steady 
solution  was  obtained  on  the  uniform  60x20 
rectangular  grid.  The  corresponding  pressure  contours 
are  presented  in  Fig.  5a.  Then  grid  adaptation  was 
perfonned  by  the  proposed  above  method.  The 
pressure  was  used  as  the  adaptation  function.  After 
that  new  steady  state  was  obtained.  Figs.  5a  and  5b 
depict  the  adapted  grid  and  associated  steady  state 
flow  solution.  These  figures  shows  that  the  pressure 
gradients  become  much  better  resolved. 

4.4.  A  underexpanded  jet  flow 

The  next  case  is  the  unsteady  underexpanded 
supersonic  jet  flow.  The  conditions  for  this  case  are: 
iU,.=1.5,  n=p/p^=2,  Tj=T^  and  rj=r^=l.4.  The 
computational  grid  is  equally  spaced  and  contains  180 
points  in  the  streamwise  direction  and  80  points  in  the 
crossflow  direction.  Fig.  6  shows  the  computed  Mach 
contours  for  most  characteristic  time  moments.  Note 
that  the  nonreflecting  boundary  conditions  allow  to 
calculate  such  rather  complicated  flow  almost  without 
unphysical  reflection  on  the  open  boundaries. 

The  steady  state  solution  is  shown  in  Fig.  7a.  Fig.  7b 
shows  the  steady  state  solution  obtained  using  coarse 
90x40  rectangular  grid.  For  this  solution  the  grid 
adaptation  was  performed  using  the  Mach  number 
gradients  as  the  adaptation  function.  In  Fig.  7c,  the 
adaptive  grid  is  presented  and  Fig.  7b  shows  the 
computed  Mach  contours.  The  solution  computed 
using  coarse  adapted  grid  is  mush  close  to  the  fine 
grid  solution  in  Fig.  7a.  But  some  flow  features  aren't 
succeeded  to  capture.  The  second  Mach  stem  places 
somewhat  farther  from  the  nozzle  cut  than  in  Fig.  7a. 
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But  the  first  Mach  stem  are  resolved  better  than  by 
using  the  fine  grid.  It  is  main  deficiency  of  moving 
solution-adaptive  grid  algorithms.  If  there  are  large 
gradients  of  parameters  in  a  flow  region  then  grid 
points  are  too  coarse  in  regions  of  middle  and  low 
gradients. 

4.5.  Nozzle  flow 

In  the  last  example  we  present  results  of  numerical 
simulation  of  the  internal  axisymmetric  nozzle  flow  of 
the  ideal  gas  with  y=\.12.  The  nozzle  consists  from 
two  parts.  First  part  is  a  Laval  nozzle  and  second  one 
is  cylindrical  tube  adjoining  to  the  supersonic  part  of 
the  Laval  nozzle. 

Initially  the  steady  state  solution  was  obtained  using 
130x40  simple  grid  (Fig.  8a).  In  Figs.  8a  and  8b  the 
top  half  of  figures  shows  the  computed  Mach  contours 
and  the  bottom  half  shows  the  computational  grid. 
For  the  computed  steady  state  solution  the  grid 
adaptation  was  performed  using  the  Mach  number 
gradients  as  the  adaptation  function.  The  new  steady 
state  solution  and  the  adaptive  grid  are  presented  in 
Fig.  8b. 

5.  CONCLUSIONS 

In  present  paper  the  upwind  monotone  numerical 
method  for  the  solution  of  the  Euler  equations  is 
presented.  This  method  is  based  on  the  high  order 
version  of  the  Godunov's  scheme.  This  method  is 
realized  using  both  the  structured  quadrilateral  and  the 
unstmctured  triangular  grids.  Essentially  2D 
reconstruction  procedures  make  possible  to  perform 
calculations  using  the  strongly  skew  grids.  Some 
features  of  2D  reconstruction  procedures  are  studied 
to  solve  the  linear  scalar  problem.  The  new  solution- 
adaptive  grid  algorithm  is  proposed. 

Presented  numerical  results  illustrates  the  capability  of 
the  proposed  algorithms.  It  can  be  see  that  the  grid 
adaptation  procedure  make  possible  to  obtain 
significantly  more  accurate  results. 
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Type  of  grid 

Reconstruction 

Uniform  grid 

Curvilinear  grid 

Random  grid 

L,  error 

Max  value 

L,  error 

Max  value 

Li  error 

Max  value 

TL 

0.0616481 

1.48475 

0.0675634 

1.22261 

0.0640970 

1.43263 

MT 

0.0216717 

2.80456 

0.0245567 

2.45170 

0.0257393 

2.85844 

BJ 

0.0246873 

2.74293 

- 

- 

0.0261619 

2.65865 

Table  1.  Results  of  solution  of  the  linear  scalar  problem  for  rotating  cone. 


Type  of  grid 

Reconstruction 

Uniform  grid 

Curvilinear  grid 

Random  grid 

L,  error 

Max  value 

Li  error 

Max  value 

Li  error 

Max  value 

TL 

0.198722 

2.10699 

0.188287 

2.05245 

0.195201 

2.02538 

MT 

0.206402 

3.66130 

0.201828 

3.39611 

0.208682 

3.58780 

BJ 

0.206362 

3.18942 

- 

- 

0.203615 

3.04373 

Table  2.  Results  of  solution  of  the  linear  scalar  problem  for  rotating  cut-out  cylinder. 


c) 

Fig.  6.  Pressure  contours  for  obhque  shock  reflection  problem  (M=2.9,  |3=29°). 
a)  solution  on  uniform  grid;  b)  solution  on  adaptive  grid;  c)  solution-adaptive  grid. 
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Fig.  9.  Nozzle  flow  problem,  a)  solution  and  grid  without  adaptation;  b)  adaptive  solution  and  grid. 
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1.  SUMMARY 

The  goal  of  the  present  investigation  is  to  discover  the  effects 
of  certain  parameters  in  a  modern  TVD  scheme  on  the 
solution  of  a  viscous  flow  problem.  This  report  includes 
details  of  the  TVD  scheme  used  in  this  study.  The  scheme  is 
an  extension  of  the  work  of  H.  C.  Yee  [1],  and  uses  an  upwind 
weighted  dissipation  term,  and  central  differencing  to 
calculate  the  viscous  terms. 

The  entropy  correction  parameter  and  the  choice  of  flux 
limiter  when  computing  viscous  flows  are  under  investigation. 
The  effectiveness  of  this  TVD  scheme  in  solving  viscous  flow 
problems  has  recently  been  questioned  by  Lin  [2].  However, 
this  investigation  shows  that  by  carefully  selecting  the  limiter 
and  the  value  of  the  entropy  parameter,  adequate  viscous  flow 
results  can  be  obtained. 

Solutions  to  the  Navier-Stokes  equations  for  an 
underexpanded  sonic  jet  on  a  flat  plate  with  a  supersonic 
crossflow  are  used  to  illustrate  the  method.  The  test 
conditions  were  M_=2.61,  and  Re=749,000,  and  the  boundary 
layer  was  considered  to  be  laminar  everywhere.  The 
numerical  code  has  been  evaluated  for  this  test  case  using 
experimental  data  presented  by  Zukoski  and  Spaid  [3]. 

The  study  includes  a  qualitative  analysis  of  the  amount  of 
artificial  viscosity  added  by  the  TVD  algorithm  compared  to 
the  real  viscosity;  an  investigation  of  the  effects  of  the 
artificial  viscosity  term  on  the  solution,  including  changes  in 
pressure  and  skin-friction  distribution  along  the  surface  of  the 
flat  plate,  and  the  change  in  the  boundary  layer  separation 
point  for  different  values  of  the  entropy  parameter. 

2.  INTRODUCTION 

Over  the  past  decade  a  vast  range  of  Total  Variation 
Diminishing  (TVD)  schemes  have  become  available,  and  have 
been  widely  used.  These  schemes  use  some  method  of 
intelligent  switching  to  put  artificial  dissipation  into  a  problem 
where  it  is  needed  or  to  conversely  remove  artificial 
dissipation  from  areas  of  a  problem  where  it  is  not.  In  this 
way  the  shock  capturing  capabilities  of  Euler  solving 
numerical  schemes  has  improved  enormously. 

When  the  Navier-Stokes  equations  are  being  solved  however, 
the  interaction  between  the  real  viscosity  and  the  artificial 
viscosity,  provided  by  these  modern  schemes,  must  be 
considered  and  evaluated. 

A  great  deal  of  work  has  been  done  on  various  aspects  of  the 
artificial  dissipation  added  in  a  scheme,  and  how  it  affects  the 
solution.  Allmaras  [4]  has  examined  a  boundary  layer  in  a 


subsonic  flow,  and  reports  that  upwind  schemes  using  a  matrix 
dissipation  technique,  are  generally  better  than  central 
difference  schemes  using  a  scalar  dissipation  formulation,  for 
producing  good  boundary  layer  profiles.  He  also  reports  a 
slight  velocity  overshoot  within  the  boundary  layer,  even 
using  the  upwind  schemes.  Tatsumi  et  al  [5]  have  concluded 
that  scalar  switching  schemes  can  produce  good  accuracy  if  a 
flux  limiting  technique  is  included.  Tatsumi  et  al  also 
suggests  that  anti-diffusive  schemes  will  produce  overshoots 
as  described  in  [4]  unless  a  flux  limiter  is  included  in  the 
algorithm.  Caughey  and  Varma  [6]  have  used  an  integral 
technique  to  measure  the  effects  of  artificial  dissipation  on  the 
flow  calculations  around  a  transonic  aerofoil.  They  found  that 
the  dissipation  errors  and  the  total  numerical  errors  were  of  a 
comparable  magnitude,  but  in  general  the  dissipation  errors 
were  larger  than  the  total  numerical  errors.  They  also  found 
that  a  Mach  number  scaling  technique,  for  reducing  the 
artificial  dissipation  being  added,  did  reduce  the  dissipation 
related  errors  in  most  cases.  Turkel  and  Vatsa  [7]  have 
compared  a  scalar  artificial  dissipation  model  with  a  matrix 
dissipation  model  on  a  3D  transonic  aerofoil.  They  found  that 
the  matrix  dissipation  model  improved  the  accuracy  of  the 
scheme.  The  accuracy  was  comparable  to  that  produced  by  an 
upwind  TVD  scheme.  It  has  also  been  reported  that  the  matrix 
dissipation  technique  in  a  central  difference  scheme  can 
produce  high  resolution  results  in  a  viscous  flow,  and  that  the 
matrix  dissipation  is  essential  for  this  behaviour,  Swanson  and 
Turkel  [8], 

In  this  work,  a  TVD  artificial  dissipation  switching  method 
which  is  similar  to  a  matrix  dissipation  technique  has  been 
studied.  This  switch  is  controlled  by  two  parameters,  the 
choice  of  flux  limiter,  and  the  value  of  the  entropy  correction 
parameter,  which  has  an  indirect  effect. 

In  order  to  study  the  effects  of  these  terms  we  have  considered 
a  complex  viscous  flow  problem,  as  further  irregularities 
caused  by  artificial  dissipation  may  present  themselves  in  such 
a  problem.  An  underexpanded  jet  interaction  with  a 
supersonic  crossflow  has  been  chosen.  This  problem  includes 
a  separated  boundary  layer  and  regions  of  recirculation,  see 
figure  1 .  The  effect  of  the  artificial  dissipation  terms  on  these 
flow  phenomena  have  not  been  examined  before. 

Jet  injection  into  a  crossflow  is  an  important  aerodynamic 
problem  and  has  many  uses  in  the  aerospace  industry. 
Reaction  control  systems  on  space  vehicles  and  missiles  are 
transverse  jet  problems.  Jets  for  vortex  control  on  the 
forebody  are  now  being  considered  for  high  angle  of  attack 
control  systems  for  fighter  aircraft.  Fuel  injection  in 
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Figure  1 .  Diagram  of  a  jet  interaction  with  a  supersonic  crossflow. 


supersonic  ramjets  (scramjets)  can  also  be  considered  as  a  jet 
in  supersonic  crossflow  problem. 

The  flowfield  around  a  jet  injection  is  shown  in  figure  1 .  An 
underexpanded  jet  transversely  injected  from  the  wall,  into  a 
supersonic  main  air  flow,  expands  rapidly  through  the  strong 
Prandtl-Meyer  fans  and  forms  a  Mach  disk.  A  bow  shock 
wave  is  formed  upstream  of  the  injector  due  to  the  interaction 
between  the  main  and  injected  flows  while  the  main  flow 
bends  the  injected  flow  parallel  to  the  wall.  A  boundary  layer 
separation  occurs  and  a  weak  separation  shock  wave  appears 
upstream  near  the  injector  due  to  an  adverse  pressure  gradient 
caused  by  the  injection  in  a  boundary  layer  flow.  Another 
weak  shock  appears  downstream  near  the  injector  due  to  a 
reattachment  of  the  injected  flow  with  the  wall. 

The  objectives  of  this  present  study  were  to  measure  the  effect 
of  the  artificial  dissipation  switching  algorithm  on  the 
accuracy  of  the  solutions  produced  for  this  test  case.  Also,  a 
quantitative  and  qualitative  examination  of  the  amount  of  real 
viscosity  compared  with  the  artificial  viscosity  being  added  in 
the  solution  has  been  included. 

3.  NUMERICAL  METHOD 
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'Cx,  =W 


4  3u 
3  3x 


2  3v  ^ 

3  3y^ 


(4a) 


3v '' 
3y  3x 


(4b) 


X 


yy 


=  (1  • 

V 


2  3u  4  3v  ^ 

3  3x  3  3y  ^ 


(4c) 


3T 

f-ux„  +  vx,,  +  k—  (4d) 

3T 

g  =  UT,j,  +  vXyy-t-k  — ;  (4e) 


p  is  pressure,  p  is  density,  e  is  total  internal  energy,  u  and  v 
are  velocity  components,  (J.  is  viscosity,  T  is  temperature,  and 
k  is  the  heat  transfer  coefficient. 

For  a  general  coordinate  system  the  equations  are  transformed 
to  the  following: 


3.1  Governing  Equations 

For  a  Cartesian  coordinate  system  the  Navier-Stokes  equations 
can  be  written  in  a  conservative  form: 
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J  is  the  Jacobian  of  the  general  coordinate  system. 


In  order  to  construct  the  total  variation  diminishing  scheme, 
the  Navier-Stokes  equations  must  be  put  into  the  following 
form, 
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where,  A  and  B  are  the  Jacobian  matrices  for  the 

transformation  of  F  and  G  respectively.  Therefore  A  and  B 

can  be  described  as 

A  =  — =  R^A  R"'  —  (8) 

au  3U 

where  the  columns  of  the  matrices  R^  and  R^  are  the  right 
eigenvectors  of  the  matrices  A  and  B.  A^  and  Ag  are 
vectors  which  are  related  to  the  eigenvalues  of  A  and  B,  and 
can  be  found  in  reference  [9]. 


3.2  Discretisation  of  the  Equations 

The  Navier-Stokes  equations,  in  the  form  shown  in  equation  5, 
can  be  simply  discretised  into  a  two  step  scheme.  The  first 
step  solves  the  flow  in  the  ^-direction,  and  the  second  step 
solves  in  the  77-direction.  This  method  of  discretisation  can 
lead  to  second  order  accuracy  in  time. 


(9a) 


where. 


A  global  time  step.  At,  is  calculated  with  a  CFL  number  of 
0.95.  The  numerical  flux  functions,  F  and  G,  in  equation  9a, 
are  calculated  using  a  finite  volume  approach. 


1 

. ■>&]..,„(“■■■ 

1 
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Note  that,  the  flux  functions  with  a  superscript  *  are  calculated 
using  U*-  In  reference  [1],  Yee  has  offered  several  other 
methods  of  calculating  the  flux  functions,  but  they  are  not 
covered  here. 


Roe's  averaging  is  applied  to  cell  vertex  points  in  order  to 
calculate  the  flow  variables  at  (i-i-l/2,j)  and  (i,j-^l/2). 

The  vectors,  <I>^  and  ,  in  equation  (10)  contain  the  anti- 
diffusive  terms  that  are  under  investigation  in  this  report.  They 
provide  a  second  order  upwind  weighted  dissipation  term,  see 
reference  [1].  The  equation  for  the  components  of  the  vector 
<I>A  are  shown  here,  a  similar  relation  can  be  formulated  for 
the  vector  ‘Fg . 


The  function  o(z)  is  defined  as, 

o(z)  =  ^[\|/(z)- 

The  function  \|/(z)  is  calculated  using, 

'^^^"{(z^  +  8^)/28 
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(14) 
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This  function  is  introduced  to  prevent  non-physical  solutions 
such  as  expansion  shocks,  when  one  of  the  eigenvalues  goes 
to  zero.  The  function  introduces  a  small  amount  of  artificial 
viscosity.  A  relationship  is  provided  in  order  that  ‘  8  ’  is 
suitably  scaled  for  highly  skewed  grids.  This  relation  is. 


8  =  8[u  +  v-h0.5c(^^^ (15b) 


where  u  and  v  are  the  co-variant  velocities.  A  study  of  the 
actual  effects  of  varying  the  value  of  the  constant,  6,  in  the 
solution,  appears  later  in  this  report.  This  constant  is  referred 
to  as  the  entropy  parameter  throughout  this  report. 

The  term  ‘g’,  in  equations  (11)  and  (13),  is  the  flux  limiter. 
Five  different  limiters  have  been  implemented  and 
investigated,  they  are  given  in  reference  [1],  and  are: 

g|,j  ==  minmod(a‘.,/2,j,a‘^,/2j  (16a) 
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g‘ .  =  S. max[o,  min(2|a|^,/,  J,  min(|a|„/,  j|,2S.a|_,/,  ^ J 

S  =  sign(a|,,^jJ 

The  minmod  function  of  a  list  of  arguments  is  equal  to  the 
smallest  argument  in  absolute  value  if  the  list  of  arguments  are 
of  the  same  sign,  or  is  equal  to  zero  if  any  arguments  are  of 
opposite  sign. 

In  equation  (16c)  e  is  included  to  stop  any  division  by  zero,  e 
is  given  a  small  value,  usually  of  the  order  of  10'^. 

The  accuracy  and  shock  resolution  of  Euler  solutions  increases 
going  from  limiters  (16a)  to  (16e).  Throughout  the  rest  of  this 
report  equations  (16a)  to  (16e)  are  referred  to  as  Limiters  1  to 
5,  respectively. 
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where  ‘a’  denotes  the  eigenvalues  of  matrix  A,  and  the 
components  of  the  vector  a  can  be  obtained  using, 

®‘i  +  l/2.j  +  (12) 

For  the  function  y  one  finds. 


3.3  Grid  Generation 

The  grid  used  for  this  problem  was  calculated  using  a  very 
simple  analytical  approach  followed  by  a  short  smoothing 
operation.  It  is  necessary  to  have  a  high  concentration  of 
points  at  the  jet  exit,  where  large  changes  in  the  flow  will 
occur,  and  near  the  wall,  in  order  to  accurately  capture  the 
boundary  layer. 

The  grid  generator  introduces  regions  of  highly  concentrated 
points  where  specified.  This  leads  to  the  production  of  a  grid 
which  is  highly  irregular.  Rapid  changes  in  grid  point 
concentration  can  cause  the  numerical  code  to  fail,  or  spurious 
glitches  in  the  solution  of  the  flow  to  occur. 
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x/L 

Figure  2.  Grid  generated  for  the  transverse  jet  case. 


A  Laplacian  smoothing  operator  is  applied  to  the  grid  to  solve 
these  problems.  Up  to  15  smoothing  iterations  are  performed 
on  the  grid.  The  number  of  iterations  executed  depends  on  the 
degree  of  irregularity  of  the  grid,  and  the  number  of  points  in 
the  regions  where  there  is  a  high  concentration  of  points.  The 
grid  generated  for  this  test  case,  using  this  method  is  shown  in 
figure  2. 

4.  RESULTS  AND  DISCUSSION 

The  test  case  for  the  investigation  is  a  laminar  boundary  layer 
developing  over  a  flat  plate.  A  jet  issues  perpendicularly  into 
the  supersonic  crossflow  from  xyL=l,  where  L=Xjc„  the 
position  of  the  jet.  The  inflow  conditions  for  the  flowfield  are 
calculated  for  a  Mach  number  of  2.61.  The  inflow  conditions 
of  the  jet  are  as  follows:  the  jet  pressure  ratio,  P/P_=7.0;  the 
temperature  ratio,  Tj/T_=1.0;  and  Mj=1.0.  The  Reynolds 
number  of  the  flow;  ReL=749,000;  and  the  freestream 
temperature,  T_=300K.  These  values  were  derived  from  the 
data  provided  in  reference  [3]. 

4.1  Comparison  with  Experiment 

Other  than  the  results  obtained  to  test  the  grid  dependency  of 
the  solution,  all  of  the  numerical  results  produced  were 
calculated  on  a  100x100  grid  as  shown  in  figure  2.  The  grid 
has  6  points  in  the  jet,  and  contains  between  30  and  35  points 
in  the  boundary  layer  region.  Simple  boundary  conditions 
have  been  used  everywhere.  The  flat  plate  is  modelled  using 
no-slip  conditions  and  is  considered  to  be  adiabatic,  the  inflow 
condition  is  fixed  at  the  initial  condition,  and  all  outflow 
conditions  use  a  simple  linear  extrapolation  technique.  The  jet 


Figure  3.  Pressure  distribution  along  the  surface  of 
the  flat  plate. 


has  a  rectangular  profile,  the  actual  profile  would  be  closer  to 
a  quadratic  profile.  The  jet  boundary  conditions  are  fixed  at 
initial  conditions  for  the  injected  flow. 

The  pressure  distribution  along  the  surface  of  the  plate  has 
been  compared  with  experimental  results  pruduced  by  Zukoski 
et  al  [3],  for  the  test  case  described  above.  The  experiments 
have  been  done  for  a  three  dimensional  case,  and  the  jet 
injection  hole  was  circular.  The  comparison  is  shown  in 
figure  3.  This  numerical  solution  has  been  calculated  using 
limiter  1 ,  and  with  5=0.001 . 

Upstream  of  the  jet,  the  results  produced  by  the  code  compare 
well  with  experiment.  Downstream  of  the  jet  there  is  a  large 
discrepancy  between  experimental  and  numerical  data.  This 
discrepancy  is  probably  due  to  three  dimensional  effects 
which  are  coming  into  play  around  the  jet.  Some  of  the  flow 
will  be  passing  around  the  jet,  reducing  the  mass  flow 
through,  and  just  downstream  of  it.  Another  possibility  can  be 
attributed  to  a  turbulent  region  forming  just  downstream  of  the 
jet.  At  this  stage  of  the  work,  the  numerical  code  does  not 
include  a  turbulence  model  and  this  part  of  the  flow  can  not  be 
accurately  represented.  However,  as  an  initial  foray  into  this 
problem  and  for  comparing  the  effects  of  changing  different 
parameters,  the  resolution  of  the  results  especially  near  and 
upstream  of  the  jet  is  adequate. 

4.2  The  Entropy  Parameter 

In  order  to  examine  the  effects  of  the  entropy  parameter,  5,  as 
defined  in  equation  (15),  numerical  solutions  for  the  test  case 
were  found  for  seven  different  values  of  6,  varying  from  0.001 
to  1.  All  of  the  cases,  in  which  the  entropy  parameter  was 
being  studied,  were  calculated  using  Limiter  1.  This  limiter  is 
the  most  robust  one,  and  was  least  likely  to  fail  when  the  more 
extreme  values  of  the  entropy  parameter  were  being  tested. 

The  skin  friction  and  pressure  distributions  across  the  flat 
plate,  for  different  values  of  5,  are  presented  in  figures  4  and 
5,  respectively.  These  plots  show  the  effect  of  5  on  the 
numerical  solution.  The  pressure  plots  show  that  as  the 
entropy  parameter  is  increased  the  shock  wave  becomes  less 
well  defined.  For  the  highest  values  of  the  entropy  parameter 
the  shock  wave  has  smeared  all  the  way  to  the  jet  injection 
point.  Also  by  increasing  the  entropy  parameter  the  low 
pressure  region  after  the  jet,  denoting  a  recirculation  region, 
becomes  damped  out  and  the  pressure  plateau  related  to  the 
separation  region  upstream  of  the  jet  reduces  considerably. 
The  Skin  friction  distributions  show  the  change  in  the  point  of 
boundary  layer  separation,  x,cp  (i.e.  where  the  skin  friction 


Figure  4.  A  graph  comparing  the  effects  of 
different  values  of  5  on  the  skin  friction  distribution 
across  the' flat  plate. 
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Figure  5.  A  graph  comparing  the  effects  of  Figure  6.  Velocity  profiles  in  the  boundary  layer 

different  values  of  5  on  the  pressure  distribution  different  values  of  5. 

along  the  flat  plate. 


profile  first  crosses  zero)  with  S.  As  the  entropy  parameter 
increases  the  moves  closer  to  Xjet.  It  can  be  clearly  seen 
from  both  of  these  plots  that  the  choice  of  8  must  be 
considered  carefully. 

In  essence,  changing  the  value  of  5  alters  the  minimum 
amount  of  artificial  viscosity  that  is  added  to  the  solution. 
This  has  the  undesirable  effect  of  altering  the  shape  of  the 
boundary  layer.  Boundary  layer  profiles  plotted  for  locations, 
x/L=0.319  and  x/L=0.879  are  presented  in  figures  6  and  7.  It 
can  be  seen  from  figure  6,  in  which  the  boundary  layer  has  not 
yet  separated,  that  as  8  is  increased,  in  general,  the  boundary 
layer  becomes  thicker.  The  variation  is  complex,  such  that  the 
boundary  layer  thickness  reaches  a  peek,  and  then  it  begins  to 
decrease  slightly.  This  is  caused  by  the  addition  of  artificial 
dissipation  to  the  real  viscosity,  introduced  by  the  Navier- 
Stokes  equations,  in  the  boundary  layer.  Figure  7  shows 
velocity  profiles  near  the  wall,  after  the  boundary  layer  has 
separated.  The  change  in  the  boundary  layer  separation  point 
has  a  great  effect  on  these  profiles.  The  recirculation  region 
increases  and  the  separated  boundary  moves  further  from  the 
flat  plate  as  8  decreases,  until  8  reaches  a  value  of  0.01. 
Further  lowering  of  8  moves  the  separated  boundary  layer 
slightly  closer  to  the  flat  plate.  However,  it  can  be  seen  that, 
when  values  of  8  below  0.01  are  used,  the  boundary  layer 
profiles  seem  to  become  less  sensitive  to  changes  in  8,  and 
have  converged.  . 

An  examination  of  the  amount  of  artificial  dissipation  and  real 
dissipation  added  to  the  solution  near  the  fiat  plate  is  shown  in 
figures  (8a)  to  (8c).  These  figures  represent  dissipation 
profiles  in  the  boundary  layer.  The  values  plotted  are  the 
artificial  and  real  second-order  terms  for  the  x-  and  y- 
momentum  parts  of  G  (see  equation  1 0),  respectively  artificial 
G2,  real  G2,  artificial  G3  and  real  G3.  These  terms  have  been 
used  because  high  gradients  of  variables  are  expected  normal 
to  the  wall  direction.  All  of  the  graphs  show  the  amount  of 
artificial  dissipation  added  to  the  scheme  compared  to  the  real 
dissipation.  Not  surprisingly,  the  addition  of  the  artificial 
dissipation  seems  to  cause  a  considerable  change  in  the  real 
dissipation  profile  which  also  must  have  an  effect  on  the 
boundary  layer  thickness.  Figures  8a- 1  to  c-1  are  plots  for  the 
unseparated  boundary  layer.  They  show  that  even  for  small 
values  of  8,  the  amount  of  artificial  dissipation  added  to  the  y- 
momentum  equation  is  considerable  compared  to  the  real 
viscosity.  For  the  x-momentum  equation  the  artificial 


Velocity,  UAJ„ 


Figure  7.  Velocity  profiles  in  the  boundary  layer 
for  different  values  of  8. 


dissipation  is  comparatively  small  for  low  values  of  8,  but  it 
increases  to  an  overwhelming  amount  for  8=1 .0.  Figures  8a-2 
to  c-2  show  the  dissipation  profiles  for  a  separated  boundary 
layer.  These  plots  show  that  even  for  very  low  values  of  the 
entropy  parameter,  the  artificial  viscosity  is  still  high  enough 
to  be  interfering  with  the  solution.  Artificial  dissipation  has 
been  introduced  by  the  scheme  to  sharpen  the  definition  of  the 
boundary  layer  as  if  it  was  a  shock  wave.  Obviously  this  isn't 
necessary  in  the  case  of  the  boundary  layer.  However,  the 
effect  of  increasing  8  seems  to  be  an  increase  in  the  thickness 
of  the  separated  boundary  layer. 

Many  of  the  methods  used  to  limit  the  introduction  of  the 
artificial  dissipation  in  the  boundary  layer  would  not  work  in 
the  separated  region.  Caughey  and  Varma  [6]  provide  two 
methods  of  limiting  the  artificial  dissipation  added  by  the 
scheme.  The  first  method  simply  sets  the  artificial  dissipation 
on  surfaces  to  zero.  The  other  method  scales  the  dissipation 
using  a  function  of  the  local  Mach  number.  Neither  of  these 
methods  are  likely  to  have  a  great  effect  on  limiting  the 
artificial  dissipation  term,  as  the  artificial  dissipation  being 
added  in  the  separated  region  is  neither  near  the  flat  plate,  nor 
in  a  relatively  low  velocity  region.  Also,  for  most  values  of 
the  entropy  parameter,  the  artificial  dissipation  added  at  the 
wall  is  relatively  close  to  zero. 

Figures  9a  to  c  give  a  more  qualitative  view  of  the  artificial 
and  real  dissipation  being  added  near  the  wall.  Contour  plots 


Height  above  plate,  y/L  Height  above  plate,  y/L  Height  above  plate,  y/L  Height  above  plate,  y/L  Height  above  plate,  y/L 


14-6 


a-1) 


Second  Order  Dissipation  Terms 


a-2) 


Second  Order  Dissipation  Terms 


Second  Order  Dissipation  Terms  Order  Dissipation  Terms 


Second  Order  Dissipation  Terms 


c-2) 


Second  Order  Dissipation  Terms 


Second  Order  Dissipation  Terms 


d-2) 


Second  Order  Dissipation  Terms 


•1.2  -1.0  -O.H  -0.6  -0.4  -0.2  0.0  0.2 

Second  Order  Dissipation  Terms 


e-2) 


•2.0  -1.5  -1.0  -OJ  0.0  0.5 

Second  Order  Dissipation  Terms 


Figure  8.  A  comparison  of  dissipation  profiles,  for  different  values  of  5  and  different  iimiters, 
through  an  unseparated  (a-1  to  e-1)  and  separated  (a-2  to  e-2)  boundary  layer. 


Figure  1 0.  Contour  plots  of  pressure  across  the  whole  flowfield(a-1  to  e-1 )  and  Mach  number  near  the  flat  plate(a-2  to  e-2) 

using  several  different  limiters  and  values  of  8, 
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Figure  1 1 ,  Pressure  distribution  along  the  flat  plate  for 
five  different  limiters. 


of  G2-real  and  G2-artificial  for  three  values  of  5  have  been 
included.  These  plots  can  not  be  directly  compared  with  each 
other  as  the  variation  of  the  contour  lines  is  different  for  each 
case.  They  show  general  trends  only.  Of  particular  interest  is 
the  small  region  just  upstream  of  the  jet  where  both  the 
artificial  and  real  dissipations  are  being  added  in  comparable 
amounts. 

Figures  10a- 1  to  c-1  show  pressure  contours  for  the  whole 
flowfield.  These  plots  illustrate  the  effects  of  increasing  5  on 
the  whole  solution.  The  separated  shock  wave  becomes  less 
well  defined  until  it  is  not  even  clear  that  there  is  a 
shockwave,  and  the  shock  wave  caused  by  the  reattachment  of 
the  boundary  layer,  downstream  of  the  jet,  also  becomes  less 
easy  to  recognise. 

Figures  lOa-2  to  c-2  are  contour  plots  of  Mach  number  for  the 
flow  near  the  wall.  These  plots  illustrate  the  Mach  disk,  the 
separation  and  reattachment  of  the  boundary  layer,  and 
regions  of  recirculation.  The  plots  also  show  the  slight 
increase  in  boundary  layer  thickness  with  increasing  6.  The 
small  pocket  of  recirculation  just  downstream  of  the  jet  is 
reduced  in  size  when  large  values  of  5  are  used. 

In  conclusion,  when  5  is  increased  to  a  value  of  0.5  and 
beyond,  the  solution  becomes  highly  inaccurate.  However,  for 
values  of  the  entropy  parameter  equal  to  and  below  0.01,  the 
solution  seems  to  be  less  sensitive  to  changes  in  5,  This 
profile  represents  the  solution  where  very  little  artificial 
dissipation  is  being  added  to  the  boundary  layer  and  it  is  also 
the  solution  closest  to  the  experimental  pressure  data. 

Other  methods  of  modelling  the  entropy  parameter,  see 
equation  (15b),  have  been  defined  by  Muller  [10,1 1]  and  Lin 
[2].  Muller  has  used  an  entropic  function  of  the  local  spectral 
radii  to  model  the  entropy  parameter.  A  brief  examination  of 
this  technique  showed  that  for  S=0.005,  and  using  limiter  1, 
the  results  produced  were  very  similar  to  the  results  produced 
using  the  method  described  by  equation  (15b).  It  is  possible 
that  Muller’s  entropy  function  will  be  more  effective  when 
used  with  other  limiters,  and  this  will  be  investigated  in  later 
work. 

Lin  [2]  states  that  the  viscous  flow  results  using  the  scheme 
described  in  this  investigation  are  unacceptable  if  a  value  of 
5=0.25  is  used.  This  agrees  with  the  results  presented  here. 
Lin  suggests  using  a  form  of  the  entropy  function  which  is 


Figure  12.  Skin  friction  plots  for  the  flat  plate,  for  each 
limiter. 


similar  to  that  shown  in  equation  (15b).  Lin  also  uses 
different  values  of  the  entropy  parameter  for  the  linear  and 
non-linear  waves.  This  means  a  much  smaller  value  for  the 
entropy  parameter  can  be  used  for  the  linear  waves  and  hence 
the  boundary  layer  will  be  less  affected  by  the  artificial 
dissipation  term.  An  investigation  of  how  effective  this 
concept  is,  is  underway.  The  results  presented  by  Lin  are 
encouraging. 

4.3  Comparing  Limiters 

The  choice  of  flux  limiter  is  an  important  factor  in  TVD 
schemes.  Five  different  limiters  have  been  investigated  here. 
Limiter  1  is  the  well  known  minmod  limiter.  Limiter  2  is  the 
limiter  formulated  by  Van  Leer  and  limiter  5  is  known  in  the 
literature  as  the  "Roe's  Superbee”  and  is  highly  compressive. 

It  was  found  that  the  numerical  scheme  failed  when  limiter  5 
was  used  if  the  entropy  parameter  was  set  below  a  value  of 
0.05.  Therefore  in  order  that  the  limiters  could  be  compared, 
the  computed  solution  was  found  for  each  limiter  with  the 
entropy  parameter  set  at  a  value  of  0.05.  In  all  likelihood  this 
means  that  the  best  possible  results  for  limiters  2,  3  and  4  have 
not  been  found. 

Figures  1 1  and  1 2  show  pressure  and  skin  friction 
distributions  along  the  flat  plate  for  each  of  the  limiters.  The 
pressure  distributions  show  that  the  major  effect  of  using 
different  limiters  is  to  change  the  point  of  boundary  layer 
separation.  The  shock  definition  on  the  surface  of  the  flat 
plate,  denoted  by  the  pressure  gradient  of  the  shock  wave,  is 
not  greatly  improved  by  using  a  more  compressive  limiter. 

From  figures  1 1  and  12  it  can  be  seen  that  limiters  2,  3  and  4 
produce  similar  results.  Also,  these  results  compare  well  with 
the  results  Obtained  using  limiter  1  with  values  of  6  below 
0.01,  as  shown  in  figures  6  and  7,  but  generally  it  seems 
limiters  2,  3  and  4  are  better  than  limiter  1 . 

From  figure  12,  one  can  see  that  limiter  5,  the  "Superbee," 
does  not  seem  to  be  well  conditioned  for  this  problem,  and  is 
probably  unacceptable  for  use  when  solving  any  viscous  flow 
problem.  The  corresponding  skin  friction  distribution  is 
considerably  different  from  those  obtained  by  the  other 
limiters,  both  upstream  and  downstream  of  the  jet. 

Figures  13  and  14  present  velocity  profiles  for  each  of  the 
limiters.  The  plots  given  in  figure  13  are  for  the  unseparated 
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Figure  13.  Velocity  profiles  in  the  unseparated 
boundary  layer  for  different  limiters. 


boundary  layer.  As  expected,  the  data  suggests  that  limiters 
with  a  more  compressive  nature  reduce  the  thickness  of  the 
boundary  layer.  In  the  case  of  the  scheme  using  limiter  5,  the 
boundary  layer  thickness  has  been  significantly  reduced. 
Again,  limiters  2,  3  and  4  produce  similar  profiles.  Limiter  1 
produces  a  slightly  thicker  boundary  layer  than  the  others. 

The  results  in  figure  14  are  velocity  profiles  for  the  boundary 
layer  after  it  has  separated.  Limiter  2,  3  and  4  produce  similar 
profiles,  however  there  is  now  a  more  significant  degree  of 
difference  in  the  results  being  obtained.  The  variation  in 
boundary  layer  thickness  for  different  limiters,  is  reversed 
compared  to  the  unseparated  case.  The  more  compressive 
limiters  produce  a  thicker  boundary  layer.  Primarily  this  is 
caused  by  the  change  in  Xsep  for  the  different  limiters.  These 
results  are  similar  to  those  obtained  for  low  values  of  the 
entropy  parameter  using  limiter  1 ,  as  shown  in  figure  7. 

Dissipation  profiles  for  limiters  1,  3  and  5  shown  in  figures 
8b,  d  and  e  respectively,  indicate  that  for  the  unseparated 
boundary  layer  the  artificial  dissipation  introduced  into  the  y- 
momentum  equation  is  high  for  limiter  3,  but  low  for  the  other 
two  limiters.  Limiter  5  introduces  artificial  dissipation  of  the 
opposite  sign  to  the  real  dissipation,  in  the  x-momentum 
equation.  This  result  explains  the  decrease  in  the  boundary 
layer  thickness  when  using  limiter  5.  Also  the  amount  of  real 
dissipation  being  added  when  limiter  5  is  being  used  is 
slightly  higher  than  for  the  other  limiters. 

The  dissipation  profiles  for  the  separated  boundary  layer  show 
a  marked  difference  between  each  of  the  limiters  used.  The 
amount  of  real  dissipation  being  added  when  limiter  5  is  being 
used  is  much  smaller  than  that  added  when  the  other  limiters 
are  used.  The  artificial  dissipation  being  added  to  the  x- 
momentum  also  seems  to  be  reduced,  although  there  is  a 
region  at  the  edge  of  the  boundary  layer  where  large  amounts 
of  dissipation  of  the  opposite  sign  is  being  added.  Each  of  the 
other  limiters  also  add  this  opposing  dissipation  but  not  in 
such  a  comparably  vast  quantity.  However  the  other  limiters 
add  more  dissipation  in  other  regions  of  the  boundary  layer, 
e.g.  near  the  wall.  Also  in  contrast  to  limiters  land  3,  the 
artificial  dissipation  added  to  the  y-momentum  equation  by 
limiter  5  is  of  the  opposite  sign  to  the  real  dissipation  being 
added.  These  graphs  go  some  way  towards  explaining  why 
the  results  from  limiter  5  are  so  different  to  the  results 
obtained  from  the’other  limiters. 

The  contour  plots  of  dissipation  for  limiter  1,  3  and  5  are 
shown  on  figures  9b,  d  and  e.  Two  main  conclusions  can  be 


Figure  14.  Velocity  profiles  for  the  separated  boundary 
layer  for  different  limiters. 


drawn  from  these  graphs.  Firstly,  on  the  real  dissipation  plots, 
the  concentration  of  the  contours  uptstream  of  the  jet  are  far 
higher  than  for  the  other  limiters,  and  there  is  a  small  region  of 
concentrated  contours  downstream  of  the  jet  for  limiters  1  and 
3  which  is  not  as  prominent  for  limiter  5.  Secondly,  the 
artificial  dissipation  added  by  limiter  5  is  quite  different  in 
pattern  to  the  dissipaton  added  by  the  other  limiters. 

The  pressure  contours  shown  in  figure  10,  show  how  the  more 
compressive  limiters  produce  better  defined  shocks  waves  and 
expansion  fans.  The  movement  of  the  separation  shock  wave 
further  upstream  of  the  jet,  for  limiter  5,  is  also  clearly 
illustrated  here.  The  Mach  number  contours  show  the  change 
in  the  thickness  of  the  boundary  layer  for  different  limiters. 
The  plot  for  limiter  5,  most  clearly  illustrates  the  reattachment 
of  the  boundary  layer.  The  recirculation  region  downstream 
of  the  jet  is  much  bigger  for  limiter  5,  than  for  the  other  two 
limiters  present. 

The  results  presented  here  show  that  limiter  5  is  not  a  good 
choice  of  limiter  for  a  viscous  flow  problem,  because  of  its 
highly  compressive  nature.  Limiter  1  produces  acceptable 
results  if  the  value  of  the  entropy  parameter  is  limited  to  0.01 
for  this  test  case.  Limiters  2,  3  and  4,  the  mid-range  limiters 
including  the  Van  Leer  limiter,  produce  similar  results  and  are 
probably  best  suited  for  this  problem. 

4.4  Effects  of  Grid  Size 

In  order  to  check  the  dependence  of  the  solution  on  the 
available  number  of  grid  points,  the  numerical  code  was  run 
on  three  different  grids  with  different  point  concentrations. 
The  grid  sizes  used  for  this  study  were,  50x50,  100x100  and 
200x200.  These  calculations  were  all  done  using  Limiter  3 
and  the  entropy  parameter,  6=0.05. 

A  comparison  of  the  pressure  distribution  along  the  flat  plate 
for  each  grid  is  shown  in  figure  15.  The  plots  show  a  severe 
degradation  of  results  between  the  different  meshes  used.  The 
most  coarse  grid  does  not  define  the  shock  wave  well,  and  the 
pressure  valley  and  plateau  upstream  of  the  jet  do  not  reach 
the  values  produced  by  the  other  two  grids.  The  main 
differences  between  the  200x200  and  the  100x100  grid  are  the 
improved  shock  definition  at  the  wall,  and  a  slight  increase  in 
the  pressure  plateau  for  the  higher  grid  concentration. 

The  skin  friction  distribution  comparison  given  on  figure  16, 
show  a  marked  difference  in  profile  for  each  grid  type.  The 
skin  friction  given  by  the  200x200  grid  is  far  higher  than  the 
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Figure  15  A  graph  showing  the  effects  of  grid  size  on  Figure  16  A  graph  showing  the  skin  friction  profile 

the  pressure  distribution  along  a  flat  plate.  along  the  flat  plate  for  different  grids. 
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Figure  17.  Velocity  profiles  for  an  unseparated 
boundary  layer  for  different  grids. 


other  grids,  and  the  recirculation  region  just  downstream  of 
the  jet  is  much  better  defined  when  more  points  are  used.  The 
differences  can  be  attributed  to  the  fact  that,  for  a  higher  point 
concentration  the  boundary  layer  is  better  defined.  The 
location  at  which  the  boundary  layer  separates  does  not  vary 
greatly  over  a  change  in  grid  point  concentration. 

Boundary  layer  velocity  profiles  are  provided  on  figures  17 
and  1 8  for  each  of  the  grids.  The  unseparated  boundary  layer 
profiles,  presented  in  figure  17,  show  the  simple  improvement 
in  boundary  layer  definition  as  more  points  are  put  near  the 
wall.  Table  1  gives  the  number  of  points  in  the  boundary 
layer  for  each  case.  The  grid  used  earlier  in  the  report  does 
not  provide  an  accurately  defined  boundary  layer,  but  it  was  of 
acceptable  quality  for  the  comparative  study  being  performed. 

Figure  18  shows  velocity  profiles  in  the  separated  region  for 
each  of  the  grids.  In  the  recirculation  region  near  the  wall,  the 
100x100  and  200x200  grids  produce  similar  results.  The 
50x50  grid  produces  a  solution  with  a  larger  amount  of 
recirculation  than  the  other  two  grids. 

Both  sets  of  velocity  profiles  show  that  the  grid  concentration 
has  a  large  effect  on  the  calculation  of  the  boundary  layer,  and 
hence  the  rest  of  the  solution. 
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Figure  18.  Velocity  profiles  for  a  separated  boundary 
layer  for  different  grids. 


Figures  19  and  20  show  the  artificial  dissipation  terms  added 
to  the  x-momentum  equation  profiled  through  the  unseparated 
and  separated  boundary  layer,  respectively.  For  the 
unseparated  boundary  layer,  the  100x100  and  200x200  grid 
produce  a  similar  profile,  and  near  the  wall  the  amount  of 
artificial  dissipation  being  added  for  each  of  these  cases  is 
quite  close.  The  real  differences  in  the  profiles  occur  towards 
the  edge  of  the  boundary  layer.  The  amount  of  dissipation 
provided  by  the  scheme  using  the  50x50  grid  is  sometimes 
twice  as  much  as  that  introduced  when  the  other  grids  are 
being  used. 

In  the  case  of  the  separated  boundary  layer,  the  50x50  grid 
seems  to  add  less  artificial  dissipation  than  the  other  two  grids. 
Although  it  is  not  clear  from  figure  20,  the  dissipation  being 
added  by  the  100x100  and  200x200  grid  is  almost  identical  in 


GRID  SIZE 

Points  in  the 
boundary  layer 

50X50 

23 

100X100 

32 

200X200 

43 

Table  1.  Grid  points  in  the  boundary  layer. 
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Figure  19  Dissipation  profiles  through  an  unseparated 
boundary  layer  for  different  grids. 


the  recirculation  region.  It  is  only  when  the  edge  of  the 
separated  boundary  layer  is  reached  that  large  differences 
become  apparent.  In  the  region  of  the  separated  boundary 
layer,  the  100x100  grid  seems  to  add  far  more  dissipation  in 
this  region  than  the  other  two  grids.  It  can  be  clearly  seen  that 
in  this  region  the  grid  is  becoming  more  coarse,  and  numerical 
truncation  errors  are  becoming  dominant. 

It  is  clear  from  this  brief  study,  that  grid  point  density  has  a 
great  effect  on  the  viscous  regions  of  the  solution,  and  the 
amount  of  artificial  dissipation  being  added  by  the  numerical 
scheme  should  not  be  analysed  in  isolation. 

5.  CONCLUSIONS 

The  numerical  scheme  described  in  this  study  has  been  used  to 
model  a  transverse  jet  interacting  with  a  supersonic  flow.  This 
test  case  differs  from  others  that  have  been  used  to  study 
artificial  dissipation,  as  the  problem  includes  a  separated 
boundary  layer,  and  reverse  flow  regions. 

This  scheme  can  be  used  to  produce  adequate  viscous 
supersonic  flows.  It  has  been  shown  here,  that  the  choice  of 
limiter  and  value  for  the  entropy  parameter  is  of  great 
importance  for  producing  good  results.  For  the  transverse  jet 
test  case,  limiters  2,  3  and  4  produces  good  results,  and  limiter 
4  is  probably  best  suited.  Although  the  examination  of  the 
entropy  parameter  was  not  done  for  each  limiter  separately, 
when  limiter  1  was  used,  it  was  found  that  the  best  results 
were  produced  when  the  entropy  parameter  was  no  greater 
than  0.01.  It  is  reasonable  to  assume  that  this  range  of  values 
will  produce  acceptable  results  with  the  other  limiters. 
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1  SUMMARY 

In  this  contribution,  we  present  a  multi-dimensional 
upwind  scheme.  In  contrast  to  the  Flux-Vector  or  the 
Flux-Difference-Splitting  method,  where  an  upwind 
operator  is  used  before  the  residual  is  calculated,  this 
scheme  uses  an  operator  on  the  discrete  flux 
integration  or  flux  balance  and  assigns  then  filtered 
parts  of  the  residuals  to  the  vertices  of  a  cell. 

The  so  called  Flux-Filter  operator,  will  be  derived  in  a 
consequent  manner  on  an  one-dimensional  basis  with 
the  purpose  to  allow  a  stable  updating.  The  scheme  is 
linearity  preserving  and  should  therefore  lead  to  an 
improved  accuracy. 

The  Flux-Filter  scheme  has  been  successfully 
implemented  on  the  Euler  and  Thin  Layer  Navier 
Stokes  equations,  for  structured  and  unstructured 
grids.  The  unstructured  grids  are  made  of  triangular 
and  quadrilateral  cells. 

List  of  Symbols 

A  flux  Jacobian  matrix 

£)(2) artificial  viscosity  vector 
E  energy 

F  inviscid  flux 

I  identity  matrix 

P  pressure  source  vector 

flux  residual  with  p  order  of  integration 
S  surface 

U  conservative  vector 

X  eigenvector  matrix 

a,b  convection  velocity 

p  pressure 

q  preferential  factor 

t  time 

u  velocity 

uF  scalar  value  on  point  ij  at  time  level  n 

'?  flux-filter  matrix 

Q  volume 

a  coefficient 


5 

kronecker  delta 

X 

eigenvalue 

^x>^y 

CFL  number  in  x  und  y  direction 

a 

root  of  scheme 

CD 

root  of  spacial  discretization 

2  INTRODUCTION 

Upwind  Methods  have  become  very  popular  over  the 
last  decade  and  can  be  categorized  into  two  major 
methods,  the  Flux-Vector-Splitting  and  the  Flux- 
Difference-Splitting  method. 

The  schemes  of  Steger& Warming  [1]  and  Van  Leer 
[2]  are  representative  for  the  Flux-Vector-Splitting 
schemes.  Here,  the  fluxes  are  splitted  into  two  parts,  a 
positive  and  negative  part.  The  positive,  respectively 
negative,  fluxes  have  purely  positive,  respectively 
negative,  eigenvalues  and  can  therefore  be 
differenciated  with  backward,  respectively  forward 
upwinding. 

The  Flux-Difference-Splitting  methods  or  Riemann 
solvers  are  another  group  of  schemes.  Here  the 
conservative  variables  are  taken  to  be  piecewise 
constant  between  the  cell  faces.  At  the  faces  there  is  a 
fluid  state  on  the  left  side  and  a  different  fluid  state  at 
the  right  side,  which  results  in  an  interaction.  This 
interaction,  seen  in  one  dimension,  has  a  mathematical 
and  physical  exact  solution.  It  is  equivalent  to  the 
shock  tube  problem  also  known  as  the  Riemann 
problem.  The  most  popular  approximated  Riemann 
solvers  are  from  Roe  [3]  and  from  Osher  [4]. 

Those  schemes  solve  the  upwinding  by  treating  each 
space  dimension  seperately  along  the  gridlines  or 
along  the  normals  of  the  cell  faces.  This  has  a 
disadvantage  that  contact  discontinuities  which  are  not 
aligned  with  the  grid  are  not  properly  solved.  To 
overcome  this  problem,  a  new  group  of  methods  has 
emerged  since  the  early  1990’s,  the  so-called  multi¬ 
dimensional  upwinding  scheme.  Two  distinct  methods 
have  been  developed  up  to  now;  the  Flux-Function 
methods  [5]  and  the  Flux-Fluctuation  methods.  The 
Flux-Function  scheme  will  calculate  the  flux  through  a 
face  in  a  Riemann  manner  but  independently  from  the 
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grid.  The  variation  in  Flux-Function  methods  lays  in 
the  ‘Riemann’  directions  chosen  which  mainly  consist 
of  the  convection  and  pressure  wave  directions  in 
contrast  to  the  main  grid  directions.  The  Flux- 
Fluctuation  methods  are  based  on  the  flux  integration 
upon  a  upwinding  method  distributes  the  residuals 
values  to  the  cell’s  vertices. 

The  Flux-Filter  scheme  is  a  variant  of  the  Flux- 
Fluctuation  methods.  Similar  approaches  are 

developed  by  Rossow  [6],  Giles  [7]  etc.,  but  this 
scheme  differs  in  the  way  how  the  flux  residual  is 
calculated  and  distributed  to  the  vertices.  The 
distribution  is  based  on  the  characteristical 

propagation  directions  along  the  grid  lines.  The  Flux- 
Filter  is  an  operator  which  selects  those  quantities  of  a 
flux  residual  that  propagate  towards  the  cell  vertex  of 
interest.  This  scheme  can  be  applied  to  structured  and 
unstructered  grids  and  can  solve  the  Euler  and  Navier 
Stokes  equations.  Stability  analysis  has  shown  that 
preferential  direction  flux  integration  and  artificial 
viscosity  is  required.  In  case  of  the  structured  grids  a 
combined  second  and  fourth  order  viscosity  is 
implemented,  for  the  unstructured  grid  method  only  a 
second  order  artificial  viscosity  have  been  introduced. 
In  the  following  section  the  basic  idea  of  the  Flux- 
Filter  scheme  is  explained  for  the  quasi  one¬ 
dimensional  Euler  equations,  followed  by  the 
extension  to  two  dimensions.  An  analysis  of  the  scalar 
model  equations  will  highlight  some  problems 
associated  with  the  trapezoidal  flux  integration.  In  the 
last  section  a  number  of  results  will  be  presented. 


3  ONE  DIMENSIONAL  FLUX 
FILTER  SCHEME 

The  discrete  quasi  1-D  Euler  equations  for  a  cell 
which  is  located  between  the  grid  points  i  and  /  - 1  is 
given  by 


ASU 


At 


-F_ 


Xi  -X,. 


W-l/2 


1+1/2 


S+1 


-0 


(3) 


3.1  The  Flux-Filter  Operator 

The  primary  purpose  of  the  Flux-Filter  operator  is  to 
extract  those  elements  of  the  residual  which  allow  a 
stable  updating  [8].  The  solution  of  a  numerical  flow 
problem  is  reached  when  the  numerical  process  has 
converged,  which  can  be  seen  as  a  numerical 
equilibrium  state.  On  a  local  scale  in  CFD  the  changes 
imposed  by  the  flux  balance  or  residual  must  result  in 
the  reduction  of  the  discrepancy  in  the  flux  balance. 
The  flux  balance  for  a  cell  7-1/2  (between  grid 
points  7  - 1  and  /  )  is  given  by 

(4) 

and  suppose  the  flux  balance  for  cell  /  -i- 1  /  2  is 
satisfied,  hence 

The  equation  which  imposes  the  condition  that  the 
summing  up  of  a  filtered  portion  of  the  residual  on 
grid  point  i  reduces  the  discrepancy  in  flux  balance  is 


Ax 

-(AU  -  aAU) 

The  influence  of  ?(Af7)  on  equation  (5)is 


ASU 

At 


F-F-x 

Ax. 


R 


1-1/2 


=  0 


(1) 


where  U  is  the  conservative  solution  vector,  F  the  flux 
vectors  and  S  the  cross  sectional  area. 


u  = 

pM 

F  = 

pu 

pu^  +  p 

P  =  ^ 

'o' 

P 

E  _ 

{E  +  p)u 

dx 

0_ 

^(F([/,,,)-F((/,+?(At/)))  =  -aAt/  (7) 

Ax 

The  stability  demands  that  for  all  grid  points  the 
discrepancies  disappear,  or  at  least  remain  bounded  by 
AU  .  Hence  the  stability  conditions  are 

0<a<2  (8) 

for  equation  (6)  and 

-l<a<l  (9) 


The  residual  needs  to  be  distributed  over  the 

At 

grid  points  i  and  /  - 1 .  The  question  is  what  part  of 
the  residual  must  be  imposed  on  the  left  and  on  the 
right  grid-points.  The  distribution  is  obtained  through 
an  operator  ?  which  must  be  linear  preserving. 
Hence,  the  equation  for  a  grid  point  i  is  given  by 


for  equation  (7).  Both  conditions  lead  to  the 
requirement 

0<a<l  (10) 

The  Euler  equations  have  the  property  that 
F(U)  -  A{U)U  where  A  =  dFldU .  Rewriting 
equation  (6)  and  assuming  AU  to  be  small  leads  to 
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)([/,  +  ?(Af/))  -  ))  (11) 

=  -^U^uAU 

Subtracting  equation  (11)  with  equation  (4)  gives 

?(Af/)-aAf/ =  0  (12) 

Ax 


The  introduction  of  the  P -vector  should  require  a 
reformulation  of  the  filter  based  upon  the  eigenvalues 
F 

and  eigenvectors  of  —  +  P.  The  determination  of 
Ax 

those  eigenvalues  and  eigenvectors  will  lead  to  a 
severe  increase  in  the  numerical  workload  and 
therefore  the  Flux-Filter  will  be  based  on  the 
eigensystem  of  the  flux. 


The  Jacobian  matrix  A  possesses  a  complete  set  of 
real  eigenvectors,  hence 

A^XAX~'  (13) 

where  X  is  the  eigenveetor  matrix  and  A  is  the 
eigenvalue  matrix.  The  operator  ?  is  now  defined  as 

9  =  XI^X'^  (14) 

where  /  is  a  diagonal  matrix  with  0  or  1.  Hence 


In  order  to  assure  the  conservation  of  the  scheme,  the 
introduction  of  the  filter  operators  may  not  lead  to 
additional  sources,  hence 


^j-l/2 


Ax. 


-E. 


(-1/2 


Ax,- 


(-1/2 


-R_ 


(-1/2 


(19) 


— X/®X“' -  a/ 1  At/ =  0  (15) 

Ax 


The  solution  of  this  eigenvalue  problem  is 


a,  =4^^, 8,  (16) 

Ax 


Imposing  condition  (10)  on  all  eigenvalues  gives 

(17) 


0<— 2.,8.<1 


where  the  positive,  respectively  negative,  Flux-Filter  is 
set  by  a  still  undefined  U^,  respectively  f/j,.  This 
implies  that 


r{u,)+7-{u,)=i 


(20) 


which  is  always  true  when  is  identical  to  t/j  .  The 
obvious  choices  are 


Ax 


(21) 


Therefore,  to  obtain  a  stable  scheme  the  5,  must  be 

zero  when  A,  is  negative  and  — (^,S,  )n,3x  -1-  The 

Ax 

first  condition  defines  the  Flux  Filter  Operator; 

(18) 

where  is  a  diagonal  matrix  where  the  elements  are 
1  where  the  corresponding  eigenvalue  A,  are  positive, 

analogue  for  .  The  second  constraint  imposes  the 
well-known  CFL  condition  for  an  explicit  scheme. 


3.2  Implementation  of  the  Flux-Filter 
operator 

The  previous  section  has  defined  an  operator  which 
will  theoretical  allow  a  stable  iterations  process.  For  its 
implementation  three  facts  must  be  taken  into  account: 

1 .  The  additional  term  P . 

2.  The  distribution  of  the  residual  may  not 
lead  to  any  sources  or  sinks. 

3.  The  filters  may  not  block  numerically  the 
propagation  of  information. 


This  leads  to  the  Flux-Filter  scheme  for  the  quasi  one 
dimesional  Flux-Filter  scheme: 


AS^. 

A/ 

+5^1/2 


+5?:, 


■1/2 


-p_ 


-X: 


i+\n 


:=0 


(22) 


where 


FUn-X(U,_y^)l{U,_y,Y  X{U,_y,)- 

with  U, 


(-1/2 


The  trivial  solution  for  steady  state  solution  is 


Ax, 


(-1/2 


(-1/2 


=  0 


(21) 


for  all  cells  of  the  computational  domain.  Unlike  the 
traditional  methods,  the  Flux-Filter  formulation  leads 
to  the  solution  of  the  original  discrete  quasi  one 
dimensional  Euler  equations  (Fig.  1). 
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4  TWO  DIMENSIONAL  FLUX- 
FILTER  SCHEME 

In  this  section,  the  construction  of  the  two- 
dimensional  Flux-Filter  [9]  scheme  will  be  outlined. 
The  numerical  equation  for  point  i,j  of  structured  grid 
becomes: 

+  ?  •^i+!/2,y+l/2  +  ?  ■^i-l/2j+l/2  (^22) 

+?  '^i+1/2,7-1/2  = 

Where  is  the  two-dimensional  Flux-Filter,  which 
models  the  upwinding.  The  residual  R^nj+ui  is  the 
surface  flux  integration  of  the  computational  cell 
defined  by  the  points 

(/,y),(/-Hl,i),(/  +  l,7+l),(/,>  +  l).  Each  ceil,  which 

belongs  to  point  i,J,  contributes  a  filtered  portion  of  its 
residual  to  the  temporal  change  of  point  i,j.  The 
purpose  of  the  Flux-Filter  is  to  extract  this  portion  of 
the  residuals  which  allows  a  stable  updating. 


?”  =  [0]  (24) 

where  U  is  the  averaged  flow  vector  for  the  cell.  This 
formulation  reflects  the  region  of  dependency.  Similar 

for  and?'^“ 

The  conservation  requirement  is  imposed  with 

(25) 

if  9"^  ^  [O]  then 

(26) 

where  n  is  the  number  of  non-null  matrices. 

The  formulation  for  a  triangular  cell  is 

if  9ij~  *  [O]  and  9ik~  *  [o]  then 

(27) 

else 

9=[0]  (28) 


4.1  Two-Dimensional  Flux-Filter 
Operator 

The  two-dimensional  Flux  Filter  is  based  on  the  one 
dimensional  operators  defined  in  the  last  section.  The 
following  constructions  have  been  tested; 

Analysis  has  lead  to  the  following  requirements  for  the 
two-dimensional  Flux-Filter: 

•  for  conservation,  the  sum  of  all  Flux-Filters  for  one 
cell  must  add  up  to  the  identity  matrix;  ^  9  -  I  ■ 

Although  that  for  a  steady  state  solution  all 
residuals  should  disappear,  this  seems  to  be  an 
unnecessary  requirement.  However,  at  a  shock  the 
temporal  change  becomes  zero  with  residuals 
which  are  non-zero.  If  this  constraint  is  not  applied 
the  shock  position  and  strength  are  incorrect. 

•  if  the  cell  has  a  supersonic  velocity  component 
pointing  away  from  a  grid  point  then  the 
corresponding  Flux-Filter  must  be  the  null-matrix. 
This  reflects  the  perception  that  in  the  above  case 
no  information  can  propagate  towards  this  point. 

This  lead  to  the  following  scheme  for  quadrilateral 
cells  (Fig  6.3): 

if  9~  ^  [O]  and  ^  [O]  then 


where  U  is  the  averaged  flow  vector  for  the  cell. 
Similar  for  9j  and  9k  ■ 

The  conservation  requirement  is  imposed  with 
Q  =  I-(9+9j+9k)  (29) 

if  9j  ^  [O]  then 

9.^9.+Q/n  (30) 

where  n  is  the  number  of  non-null  matrices. 


4.2  The  Flux  Residual 

The  flux  residual  or  flux  balance  is  the  flux  integral 
over  the  cell’s  circumference. 

—+—iFndS  =  0  (31) 

dt  QJ 

where  FI  is  the  cell’s  surface.  The  first  order  discrete 
flux  integration  based  on  point  /,y  for  a  quadrilateral 

cell  is 


•^4-l/2,;+-l/2  ' 


i.7+1/2] 


(32) 


else 


The  second  order  flux  integration  or  trapezoidal 
integration  is  given  by 
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^(2) 

2'i+l/2,;+l/2 

1 


QL 

F 


model  equation.  The  model  equation  for  the  multi¬ 
dimensional  scheme  will  be  the  two  dimensional 
convection  equation: 


du  du  ,  du  „ 
—  +  a  — -Hi— =  0 
dt  ax  ay 


(39) 


Note  the  inclusion  of  the  information  attached  to  the 
vertex  /  -i- 1,  y  -i- 1 .  The  first  order  integration  violates 
the  criteria  which  requires  that  the  flux  calculation  of 
a  cell’s  face  is  independent  of  the  cell  in  consideration. 
The  first  order  integration  has  the  advantage  that  the 
resulting  scheme  is  stable.  The  second  order 
integration  fulfills  the  latter  criteria  but  it  has  a 
stability  problem,  which  will  be  analyzed  and  clarified 
in  detail.  For  those  reasons,  a  blended  first  and  second 
order  integration  will  be  used,  and  is  called  the 
preferential  direction  integration  Given  by 

^M/2,j+H2  ~  P^M/2,j+\l2  0  “  P)^Lu2J+]/2  (34) 

where  the  calculations  have  shown  that  for  accuracy 
and  stability  the  optimum  value  of  q  is  0.25. 

For  reason  of  stability,  artificial  viscosity  must  be 
introduced,  which  is  of  course  an  unwelcomed  feature. 
The  artificial  viscosity  is  a  second  and  fourth  order 
damping  given  by 


In  this  equation  a  quantity  u  is  convected  with  a 
velocity  a  in  the  x-direction  and  a  velocity  b  in  the  _y- 
direction.  The  theoretical  solution  preserves  the  initial 
function  along  the  convection  direction.  The  accuracy 
by  which  a  numerical  solution  approaches  this 
theoretical  solution  is  fully  dependent  on  the 
numerical  scheme. 

Equation  (39)  represents  the  numerical  model 
equation  for  a  one-by-one  dimensional  first  order 
upwind  scheme  and  for  the  Flux-Filter  scheme  with 
the  first  order  flux  integration 

-  <; )  =  -(«Ay  +  blSx)u,j 
+(aAy)«,._,  +  {b^x)u-j_^ 

for  positive  values  of  a  and  b.  The  multi-dimensional 
upwinding  scheme  makes  use  of  the  trapezoidal 
integration  of  fluxes.  Hence,  the  numerical  model 
equation  is  given  by 


= 

and 

(-(^,-2,;  +  4f/,_,,,  -  j 


(36) 


(37) 


4.3  Thin  Layer  Navier  Stokes 

The  terms  for  the  viscous  fluxes  are  introduced  on  the 
right  hand  side  of  equation  (22).  Hence 


AC/, 


^  +  7  PMI2.i+V2  ?  ^)-\n,j+\l2 


/St 


0(9) 


,++^(9) 
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(41) 


9-1, >-i 


The  numerical  behaviour  of  both  methods  are  analyzed 
for  convection  directions  of  0,  22.5  and  45  degrees  for 
a  delta  function  on  the  y-axis.  Figure  2  presents  the 
result  of  a  22.5  degree  convection  using  the  first-order 
integration  and  using  the  trapezoidal  integration 
method.  The  diffusion  is  dominant  for  the  convection 
direction  of  45  degrees,  while  for  a  convection 
direction  aligned  with  the  x-axis  the  scheme  produces 
the  theoretical  result.  For  the  reason  that  the  numerical 
scheme  will  produce  considerable  cross-diffusion  when 
the  convection  direction  is  not  aligned  with  the  grid 
the  first  order  integration  scheme  is  highly  grid 
dependent. 


To  obtain  a  meaningfull  viscous  solution,  the  influence 
of  the  viscous  term  must  exceed  the  influence  of  the 
artificial  viscosity  in  the  viscous  dominant  flow 
regions. 


5  SCALAR  MODEL  EQUATION 

In  this  section,  we  will  analyze  the  accuracy  and 
stability  aspects  of  the  Flux-Filter  scheme  based  on  the 


The  second  order  approach  leads  to  the  solutions 
which  matches  the  theoretical  solution  for  convection 
directions  of  0  and  45  degrees.  The  solution  for  the 
22.5  degree  convection  is  dispersive.  The  accuracy 
levels  of  both  methods  are  given  in  the  next  table 
where  the  error  is  the  mean  square  deviation  from  the 
theoretical  solution: 
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Ermr 

first  order 

second 

0 

0.00 

0.00 

22.5 

0.16 

0.12 

45 

0.23 

0.00 

Although  the  second  order  flux  integration  produces 
better  results  and  is  therefore  less  grid  dependent,  the 
transient  phase  can  be  extremely  chaotic.  The 
following  case  will  clarify  a  peculiar  problem.  Setting 

wo.j  =0>V/' 

=  1  (43) 

0, 

a  =  — 

0. 

with  0^.  =  a  At  /  Ax  and  0^  =  bAt  /  A_y 

the  values  for  all  w,  y  can  be  determinated  in  function 

of  a  with  a  the  second  order  integration  scheme 


hence 


1-a 
1  +  a 


u 


(44) 


(45) 


For  the  case  that  a  appraoches  zero  the  scheme  will 
produce  an  undamped  oscillatory  profile  in  the 
complete  domain.  This  reduces  the  robustness  of  the 
Flux-Filter  scheme  where  even  a  slight  disturbance  is 
immediately  transmitted,  in  an  unfavorable  manner, 
throughout  the  numerical  domain.  In  contrast,  the 
profile  for  the  first  order  integration  scheme  is 


a 


1-fa 


or  = 


(46) 


which  does  not  have  a  oscillatory  behaviour.  The 
problem  can  be  remedied  with  the  addition  of  artificial 
viscosity  and/or  the  use  of  a  preferential  integration 
direction.  The  preferential  integration  is  a  blended 
form  between  the  trapezoidal  integration  and  the  first 
order  integration  (Eq.34).  Hence  the  preferential 
integration  is  a  first  order  integration: 

<;'-<y=-(.5+40,-f0,)«,,. 

-(-(•5  +  ^)0^+(.5-g)0^)«M,j 

-(•5-9)(-0x  -0y)«,-ij-i 


For  the  critical  case  that  0^  =  0  the  profile  becomes 


.5  +  e. 


(48) 


which  means  that  for  a  positive  value  of  q,  Uj  reduces 

exponentially.  With  this  scheme  the  oscillatory 
behaviour  is  contained.  The  errors  from  the  theoretical 
solution  are  given  in  the  next  table  for  the  method 
with  artificial  viscosity  and  for  the  method  with  the 
preferential  integration. 


Error 

artificial 

viscosity 

preferential 
integr.  q=Q.25 

preferential 

a.v. 

0 

0.069 

0.000 

0.038 

22.5 

0.097 

0.093 

0.073 

45 

0.081 

0.115 

0.095 

Hence,  modifications,  such  as  artificial  viscosity  and 
preferential  flux  integration,  must  be  included  to 
stabilize  the  scheme.  The  pure  second  order  flux 
integration  scheme  will  not  work. 


5.1  Von  Neumann  Stability  Analysis 

The  results  of  a  Von  Neumann  stability  analysis  [10] 
is  presented  in  figure  5  where  the  maximum 
amplification  factor  is  plotted  in  function  of  0j,,0^  for 
q  =.25  and  a  small  amount  of  artificial  viscosity. 
Stability  is  obtained  when  6^,0^  <0.3  which  is  the 
upper  limit  for  the  CFL  number. 

Each  time  discretization  method  has  its  own  stability 
contour  wherein  the  roots  of  the  space  discretization 
must  lay.  Figures  4  presents  the  spacial-roots  for  the 
Flux-Filter  scheme  for  CFL  numbers  of  0.3,  0.5,  2.0, 
10.0.  The  conclusion  is  that  the  gain  for  using  Runge 
Kutta  is  minor  and  that  in  theory  large  CFL  numbers 
can  be  used  with  an  implicit  scheme.  However,  in 
practise,  the  implicit  scheme  worked  for  a  maximum 
CFL  number  of  1.0.  The  gain  of  factor  3  is  not 
sufficient  to  justify  the  use  of  an  implicit  scheme,  due 
to  the  significant  increase  in  workload. 


6  RESULTS 

The  supersonic  wedge 

The  geometry  is  a  two  dimensional  channel  with  a  15° 
wedge  and  followed  by  a  15°  expansion  corner  (Fig  6). 
The  inflow  Mach  number  is  set  to  2.  This 
configuration  induces  interactions  between  shock  and 
expansion  waves  [11].  A  shock  wave  is  produced  at 
the  wedge  and  reflected  at  the  upper  boundary.  The 
reflected  shock  wave  is  weakened  by  the  expansion 
fan.  The  expansion  fan  is  also  reflected  by  the  upper 
boundary.  Dependent  upon  the  length  of  the  channel, 
the  shock  wave  and  expansion  fan  are  reflected 
manifold.  Analytical  results  predict  a  Mach  number  of 
1.454  behind  the  first  shock  wave.  The  maximum 
deflection  for  this  Mach  number  is  10.5°,  which 
implies  that  the  first  reflection  will  induce  a  subsonic 
flow  with  an  entropy  layer  (or  slip  stream).  This 
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problem  is  a  widely  used  standard  test  case  [11],  The 
grids  are  a  structured  180  by  60  grid,  a  unstructured 
grid  with  1664  quadrilateral  elements  and  an 
unstructured  grid  with  3633  triagular  elements.  Figure 

6  presents  the  mach  number  distribution  for  the  3 
grids. 

NACA0012  Transonic 

This  standard  AGARD  test  case  [12]  consists  of  a 
NACA0012  profile  in  a  transonic  flow.  The  angle  of 
attack  is  1.25°  and  the  free  stream  Mach  number  is 
0.8.  The  main  features  are  a  strong  shock  located  at 
X-Q.62  on  the  upper  surface  and  a  weak  shock  at 
X  =  0.37  on  the  lower  side.  The  Mach  number 
distributions  is  given  in  figures  7. 

This  problem  is  solved  with  a  Runge-Kutta  scheme  on 
a  160  by  50  structured  grid.  The  maximum  CFL 
number  was  0.4.  After  3000  iteration  steps,  a  pseudo 
convergence  was  obtained,  where  the  upper  shock 
position  continued  to  move  back  and  forward  around 
one  grid  cell. 

The  predicted  Cl  (0.3708)  and  Cd  (0.0213)  coefficients 
correspond  well  to  those  given  by  the  AGARD 
test.  [12] 

NACA0012  Mach  0.5  Reynolds  5000 

This  test  case  [13]  demonstrates  the  use  of  the  Flux- 
Filter  scheme  on  the  Navier  Stokes  equation.  The  test 
case  consists  of  a  subcritical  flow  (Ma=0.5)  over  a 
NACA-0012  with  a  Reynolds  number  of  5000.  The 
angle  of  attack  is  0  degrees.  The  flow  has  a 
recirculation  at  the  trailing  edge.  The  predicted 
location  of  separation  from  references  is  at  x=0.82 
This  scheme  predicts  the  flow  separation  to  at  x=0.92 
(Fig.  8),  which  indicates  that  the  Flux-Filter  scheme 
has  an  incorrect  degree  of  dissipation. 

7  CONCLUSIONS 

The  Flux  Filter  scheme  has  been  applied  to  the  Euler 
and  Navier  Stokes  equations,  on  structured  and 
unstructured  grids,  and  with  Euler  stepping,  Runge- 
Kutta  and  with  an  implicit  time  integration  scheme. 
The  results  from  the  computations  have  demonstrated 
several  features.  First,  it  has  been  shown  that  the  Flux- 
Filter  method  is  capable  of  obtaining  highly  accurate 
solution  on  the  basis  of  a  truly  multi-dimensional 
approach.  Second,  the  stability  analysis  has  shown  that 
the  scheme  does  not  allow  too  large  time  steps. 

The  integration,  spatial  and  temporal,  could  be 
improved  to  allow  much  larger  time  step  and 
convergence  rates.  This  would  be  advantageous  for  the 
3 -dimensional  version.  One  way  is  the  use  of  higher 
order  flux  integration  or  the  use  of  flux  limiters.  The 
temporal  integration  can  be  improved  by  solving  the 
implicit  method  more  accurately  or  by  implementing  a 
multigrid  scheme.  The  improve-ments  should  also  be 
aimed  at  reducing  the  level  of  artificial  viscosity  or  to 
eleminate  its  use. 


Extension  to  a  3 -dimensional  Flux-Filter  scheme  is  in 

progress 
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Figure  2  Solution  of  the  scalar  model  equation  with  first  order,  respectively  second  order  flux  integration. 
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Figure  6  Supersonic  flow  over  a  15  degree  wedge  solved  on  different  grid  types. 
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Unstructured  Grid  1664  Quad-Elements  1759  Nodes  (Advancing  Front  Technique) 


Unstructured  Grid  3633  Tri-Elements  1911  Nodes  (Advancing  Front  Technique) 
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Figure  5  Amplification  factor  and  roots  (CFL 


=  0.3)  for  the  flux  filter  scheme. 
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Figure  7  Transonic  flow  over  a  NACA0012  profile 
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Figure  8  Navier  Stokes  solution  around  NACA0012  with  a  Reynolds  number  of  5000. 
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Abstract 

The  paper  reviews  recent  developments  in  multidimen¬ 
sional  upwind  schemes  based  on  the  residual  decomposi¬ 
tion  or  fluctuation  splitting  approach.  Unlike  the  stan¬ 
dard  finite  volume  approach,  the  upwinding  is  based 
on  multidimensional  physics,  e.g.  convection  of  entropy 
and  total  enthalpy  along  the  streamline  and  convection 
of  acoustic  Riemann  invariants  along  the  Mach  lines  in 
steady  supersonic  flow.  The  resulting  schemes  on  trian¬ 
gles  and  quadrilaterals  are  very  compact,  with  stencils 
consisting  of  nearest  neighbours  only  and  can  be  made 
monotonic  and  second  order,  like  the  TVD  schemes  in 
finite  volumes.  Numerical  examples  show  the  improved 
performances  compared  to  state-of-the-art  methods.  The 
paper  further  describes  the  introduction  of  convergence 
acceleration  techniques  which  exploit  the  compactness  of 
the  stencds  and  the  implementation  of  solution  adaptive 
error  control.  The  latter  is  based  on  scalar  finite  element 
(I  posteriori  error  estimates  which  are  applied  to  the  Euler 
system  in  decoupled  form  thanks  to  the  multidimensional 
residual  decompositions. 


1  THE  CASE  FOR  MULTIDIMEN¬ 
SIONAL  UPWINDING 


At  the  heart  of  present  day  upwind  schemes  for  com¬ 
puting  compressible  flows  is  the  solution  of  the  one¬ 
dimensional  Riemann  problem  :  it  describes  the  evolu¬ 
tion  of  the  flow  which  results  from  bringing  into  contact 
two  fluids  at  constant  but  different  states. 

Conservative  Finite  Volume  Methods  use  this  building 
block  as  follows  :  writing  the  conservation  law  for  a  given 
cell,  the  cell-face  fluxes  in  the  spatial  operator  are  evalu¬ 
ated  by  solving  for  each  face  in  turn,  the  one-dimensional 
Riemann  problem  defined  by  the  cell  averages  (or  a  recon¬ 
struction  at  the  cell  face)  on  either  side,  thereby  assuming 
a  series  of  one-dimensional  problems  in  the  direction  of 
the  cell-face  normals. 

.Although  extremely  successful,  the  question  rises  how 
truly  multidimensional  physics  could  be  brought  into  this 
picture,  and  what  benefits  could  be  expected  from  doing 
so. 

Consider  therefore  the  case  of  the  steady  Euler  equations. 
Choosing  entropy,  total  enthalpy  and  the  components  of 
velocity  as  the  independent  variables,  the  equations  take 
the  following  familiar  quasilinear  form 


u  ■'VS  —  0 

il-VH  =  0 

(1  -  +  Vx)  +  (1  -  =  0 

t’x  —  Oy  -f  Qxd-nS  -I-  a^dnli  —  0. 


(1) 


Uie  first  two  equations  express  that  entropy  S  and  to¬ 
tal  cut  halpy  H  are  Riemann  invariants  along  the  stream¬ 
lines.  which  are  the  fieldlines  of  the  velocity  vector  u  with 


Cartesian  components  u  and  v.  The  third  equation  is  the 
familiar  compressible  potential  eciuation  written  in  primi¬ 
tive  variables  u  and  v,  with  c  the  soundspeed.  The  fourth 
equation  is  the  vorticity  equation  (Crocco’s  equation), 
which  is  coupled  to  entropy  and  total  enthalpy  through 
derivatives  in  the  direction  n  normal  to  the  streamline, 
dn  =  —^9x  -b  ^dy,  where  q  =  vwUftA  is  the  norm  of 
the  velocity  vector.  The  coefficients  oi  and  02  are  given 
by 

1  1 

01  =-7 - rt — ,  02  =  -,  2) 

(7  -  i)m  <? 

corresponding  to  the  definition  dS  =  dp  —  c^dp.  For 

shock  free  flow  with  uniform  homentropic  and  homen- 

thalpic  inlet  conditions,  the  first  two  equations  have  the 
trivial  solution  S{x,y)  =  C‘ ,  H{x,y)  =  C\  leading  to 
the  potential  formulation  for  steady  irrotational  flow. 

At  this  stage  of  the  analysis,  it  is  instructive  to  recall  that 
the  first  two  equations  are  ordinary  differential  equations, 
which  can  be  integrated  by  marching  along  the  stream¬ 
line  starting  at  the  inlet  boundary,  commonly  known  as 
the  method  of  characteristics.  It  is  in  fact  remarkable 
that  this  idea  of  upwinding  entropy  and  total  enthalpy 
along  the  streamline  is  totally  absent  in  the  state-of-the- 
art  conservative  methods  for  solving  the  steady  Euler 
equations.  One  of  the  key  aims  in  multidimensional  up- 
winding  methods  is  precisely  to  re-introduce  this  idea  in 
a  conservative  formulation.  Note  further  that  these  two 
equations  are  equally  valid  in  3D. 

Turning  our  attention  back  to  the  two  remaining  equa¬ 
tions,  the  analysis  simplifies  by  considering  a  streamline- 
aligned  coordinate  system  with  coordinates  X  in  the 
streamline  and  Y  in  the  normal  directioix.  In  the  new 
axes,  the  velocities  are  denoted  hy  U  =  q,  V  =  0,  giving 
for  the  Euler  equations  (1); 


9x5  = 
dxH: 
dxU- 


0 

:  0 


rdyV 


dxY  —  dyll  -b  aidyS  -b  a2dYH  =  0, 


(3) 


For  supersonic  flow,  the  latter  two  equations  can  be  ex¬ 
pressed  as  ordinary  differential  equations  along  the  Mach 
lines  F"*"  and  F",  (see  Figure  1),  by  introducing  the  acous¬ 
tic  characteristic  variables 

9C+  =  0195  -b  a-idH  -  dU  +  ' . dV 

\/a^  (4) 

dC- =  axdS  +  a^dH  -  dU  -  dV  ^  ^ 

VAr=-i 

In  these  variables  the  Euler  system  for  steady  supersonic 
flow  can  be  written  as 


9x5  =  0 
9x7/ =  0 

9xC'+  -b  -^■7  =0 

VAf2_i 

dxC-  -  -j^dyC-  =  0, 
v/a/2-1 


(5) 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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Fig.  1  :  Mach  angles  ±/i  and  streamline  coordinate  sys¬ 
tem  (A’.V) 


where  te//  =  ,  '■  defines  the  Mach  angle  ±n  be- 

tween  Machlines  and  streamline,  see  Figure  1.  Again, 
the  physics  dictate  a  multidimensional  upwinding  dis¬ 
cretization  of  the  acoustic  equations  by  a  space  opera¬ 
tor  upwiiided  along  the  Machlines,  as  in  the  method  of 
characteristics.  The  point  of  deviation  from  the  method 
of  characteristics  is  to  achieve  this  in  a  conservative  for¬ 
mulation,  such  that  shocks  and  contacts  can  be  handled 
without  any  special  treatment.  This  indeed  is  the  hall¬ 
mark  and  basis  of  success  of  the  state-of-the-art  Finite 
Volume  approach.  The  price  paid  in  these  methods, 
however,  is  that  the  upwinding  is  based  on  locally  one¬ 
dimensional  physics,  by  considering  the  states  adjacent 
to  each  finite  volume  face  as  the  initial  data  for  a  one- 
dimen.sional  Riemann  problem  in  the  direction  of  the  face 
normal.  Such  an  approach  precludes  the  use  of  Mach  lines 
or  the  streamline  as  the  upwinding  directions. 

For  subsonic  flow,  the  two  acoustic  equations  form  an 
elliptic  subset,  and  it  is  less  clear  what  should  be  the 
optimal  space  discretization. 

To  fix  the  ideas,  con.sider  again  the  Euler  system  in  the 
form  of  eq.  (3),  but  assume  that  the  inlet  conditions  are 
such  that  the  flow  is  irrotational,  so  that  the  third  and 
fourth  equations  decouple  from  5  and  H  : 


dxU  -  j^dyV^O 
dxV  -dyU^O 


(6) 


For  subsonic  flow,  the  eigenvalues  of  the  system  matrix 
(and  hence  the  Mach  angles)  become  complex  ( 

Y  \  — 

For  Af  —  0  this  is  the  set  of  Cauchy-Riemann  equations 
governing  incompressible  potential  flow. 


Different  ways  for  discretizing  such  a  system  will  be  dis¬ 
cussed  in  section  2.  All  these  methods  are  based  on 
an  unsteady  version  of  eqn(5)  or  (6),  whereby  the  un¬ 
steady  terms  are  in  general  chosen  not  to  be  physical, 
but  such  that  the  resulting  system  is  hyperbolic  in  time 
and  achieves  fast  convergence  to  the  steady  state. 


Snell  a  choice  [1]  is  given  by  the  following  system,  both 
valid  for  subsonic  and  supersonic  flow,  and  called  the 
hyperbolic/elliptic  splitting.  Defining  the  characteristic 
variables 


aw  =  f 


.w/3ac+ 

MpdC- 
OH  as 


pc(-i-l) 

dS 


(7) 


it  is  given  by 


aw 

at 


+ 


+ 


Xi'" 

0 

0 

K 

0 

0 

0 


Xi' 

0  1 

0  0 

0  0  0 
0  0 


0  0 
0  0 


aw 

ex 


e 

0  0  0 

0  0  0 


aw 

av 


(8) 


or  in  Cartesian  coordinates 


aw  ,  5W  ,  „  aw  ^ 

-b V  Bw-t: — =  0-  (9) 

dt  ax  ay 


Here  rz'*'  and  iz  are  defined  as  iz+  =  ' 

and  ,v  and  p  are  given  by  p  = 
V'max(eMM2  -  1|)  and  y  =  -  To  circumvent 

the  singularity  at  the  sonic  point,  e  is  different  from  zero 
and  given  a  small  value  (typically  0.05). 

Clearly,  the  third  and  fourth  characteristic  equations  de¬ 
couple  for  ail  flow  regimes,  implying  as  before  that  en¬ 
tropy  and  total  enthalpy  are  conserved  along  streamlines 
in  the  steady  state.  Considering  the  first  and  second 
equations,  jz“  =  0  and  r/'*'  =  1  for  supersonic  flow,  and 
the  eqnations  are  fully  diagonal;  the  system  is  in  fact 
identical  to  eqn(5),  where  the  acoustic  variables  are  made 
to  propagate  along  the  Mach  lines.  In  the  subsonic  case, 
the  system  is  no  longer  diagonal  and  the  two  acoustic 
equations  become  coupled  and  form  a  system  which  is 
elliptic  at  steady  state. 

The  residual  in  conservative  variables  is  obtained  by 
transforming  eqn(9),  giving 


OT  dG 
dx  dy 


dW  „  5W 

Aiv—. - b  Bw— — 

ax  ay 


(10) 


where  R  is  to  transformation  matrix  from  characteristic 
to  conservative  variables. 


2  RESIDUAL  DISTRIBUTION  SPACE 
DISCRETIZATION 

The  finite  volume  setting  with  its  underlying  discontinu¬ 
ous  solution  representation  naturally  leads  to  the  defini¬ 
tion  of  ID  Riemann  problems  at  the  discontinuous  inter¬ 
faces,  although  some  progress  has  been  made  in  the  so¬ 
lution  of  three-state  two-dimensional  Riemann  problems 
[2]. 

Therefore,  in  this  work,  we  concentrate  on  approaches 
based  on  a  continuous  representation  of  the  solution  over 
structured  or  unstructured  meshes,  with  the  solution 
stored  at  the  vertices.  Such  a  framework,  which  allows 
easy  incorporation  of  upwinding  concepts,  is  provided  by 
the  residual  distribution  approach  : 

•  in  a  first  step,  the  conservative  flux  balance  (cell 
residual)  is  evaluated  over  a  cell  with  unknowns  lo¬ 
cated  in  the  vertices  by  a  simple  contour  integration 
(e.g  trapezium  or  midpoint  rule). 

•  in  a  second  step  the  cell  residuals  are  distributed 
to  the  vertices  to  form  the  nodal  space  operator  (or 
nodal  residual)  which  becomes  a  weighted  average 
of  the  adjacent  cell  residuals. 

The  space  discretization  is  consistent  and  conservative, 
which  is  easily  shown  by  summing  up  the  discrete  equa¬ 
tion  for  all  nodes  and  observing  that  the  interior  fluxes 


V 
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vanish,  leaving  only  the  contributions  from  the  bound¬ 
aries. 

Various  residual  distribution  schemes  have  been  pro¬ 
posed,  such  as  the  central  scheme  of  Jameson  [3]  or  the 
Lax-WendrofF  schemes  of  Ni  [4],  Hall  [5]  and  Morton  [6]. 
In  the  present  context  the  residual  distribution  frame¬ 
work  has  been  used  to  formulate  conservative  multidi¬ 
mensional  upwind  advection  schemes.  At  the  scalar  level, 
linear  and  non-linear  advection  schemes  are  obtained  by 
distributing  the  cell  residual  to  the  downstream  nodes 
only.  In  this  way,  properties  such  as  positivity  and  sec¬ 
ond  order  accuracy  (linearity  preservation)  [7,  8,  9]  can 
l-ie  built-in.  As  long  as  the  distributed  parts  sum  up  to 
t  he  conservative  ceil  residual,  the  schemes  satisfy  discrete 
conservation. 

The  application  to  the  Euler  equations,  e.g.  for  advec¬ 
tion  of  entropy,  total  enthalpy  and  acoustic  variables,  is 
straight  forward,  provided  that  a  conservative  lineariza¬ 
tion  can  be  found  which  ensures  that  the  flux  balance  over 
a  cell  can  be  written  exactly  in  terms  of  the  quasilinear 
equations  discussed  in  section  2. 


2.1  Distribution  schemes  for  scalar  ad¬ 
vection 

The  subject  of  multidimensional  shock-capturing  advec¬ 
tion  schemes  on  triangles  has  been  extensively  treated  in 
previous  publications,  and  the  reader  is  referred  to  [9,  10], 
a.s  well  as  to  the  work  of  Roe  and  Sidilkover  [8,  11]  for  de¬ 
tails.  Only  the  most  important  aspects  are  recalled  here, 
and  the  extension  to  quadrilaterals  is  briefly  discussed. 
Consider  the  linear  advection  equation,  with  constant  A: 


^  +  A.Vu  =  0  (11) 

at 

The  corresponding  integral  form  of  eqn(ll)  is  obtained  by 
integrating  over  a  control  volume  Q  (triangle  or  quadri¬ 
lateral).  This  leads  to  the  definition  of  the  cell  residual 
or  fluctuation. 


<3 


i'l 


A  •  Vudfi  = 


^  U  A  •  Tl^xt  dT, 


(12) 


where  F  is  the  boundary  of  the  control  volume  O.  Be¬ 
cause  the  solution  is  stored  in  the  vertices  of  the  cell,  the 
contour  integral  can  be  easily  evaluated  by  the  trapez¬ 
ium  rule.  In  the  fluctuation-splitting  approach,  fractions 
of  o''  are  sent  to  the  ceU  vertices,  which  after  assembling 
contributions  from  all  cells  leads  to  the  nodal  update. 
The  semi-discretization  at  point  i  is  then 


chi, 

Ih 


-J- =  (13) 

*  '  n  *  n 

-R[u,)  (14) 


where  S,  is  the  area  of  the  median  dual  cell  around  node 
(.  and  the  .dip  are  the  the  distribution  coefficients  which 
Slim  up  to  one  for  each  cell.  The  way  these  coefficients 
are  evaluated,  determines  the  properties  of  the  scheme, 
'file  most  inrportant  of  these  are; 


•  Positivity 

A  monotonic  scheme  can  be  obtained  by  demanding 
positivity.  Suppose  that  the  itumerical  solution  at 
mesh  point  i  is  u,.  Then  the  positivity  property 
requires  that  in  the  discrete  form  of  eqn(ll) 


-S,  =  -  «”) 


(15) 


the  coefficients  Cki  are  all  non-negative  for  k  / 
i.  Stability  and  monotonicity  preservation  is  then 
guaranteed  under  the  CFL-like  condition 


k 


(16) 


Indeed,  if  u”  is  a  local  maximum,  i.e.  (u"  —  u")  <  0 
Vfc,  then  ^  <  0.  Consequently  a  local  maximum 
cannot  increase  and  similarly  a  local  minimum  can¬ 
not  decrease.  The  condition]  15)  is  called  global  pos¬ 
itivity  and  is  difficult  to  impose  for  fluctuation  split¬ 
ting  schemes.  Therefore  a  more  restrictive  property 
is  introduced,  namely  local  positivity,  see  [12|.  This 
means  that  condition]  15)  is  imposed  for  each  con¬ 
tributing  control  volume  in  eqn(14),  which  is  very 
easy  to  check.  Positivity  will  in  general  be  linked  to 
some  upwinding.  In  the  fluctuation  splitting  context 
upwind  biasing  is  obtained  by  limiting  the  distribu¬ 
tion  of  the  cell  residual  to  the  downstream  nodes. 


•  Linearity  Preservation  or  Residual  Property 
Second  order  truncation  error  in  the  steady  state 
is  obtained  by  demanding  that  no  updates  are  sent 
to  the  vertices  if  the  cell  residual  is  zero.  This  is 
obtained  when  the  distribution  coefficients  /3p  are 
bounded,  such  that 

0  when  -4  0  (17) 


It  can  be  proven  that  only  non-linear  schemes  (a  scheme 
is  called  linear  if  the  coefficients  Ck,  in  eqn(15)  are  inde¬ 
pendent  of  Uk)  can  satisfy  both  properties. 


2.1.1  Schemes  on  triangles 

Considering  the  triangle  with  inward  normals  h,  shown 
in  figure  2,  the  fluctuation  <f>^ ,  eqn(12),  can  be  written 

^  k,u7  k,  =  iX  •  n,  (18) 

1=1 

The  k,  are  convenient  parameters  in  the  design  of  upwind 


Fig.  2  :  Triangle  and  inward  normals  ii, 


schemes.  Since  the  inward  normals  Hi  sum  up  to  zero,  one 
has  also  fc,  =  0.  Four  important  distribution  schemes 
are: 


The  N  and  PSI  or  limited  N  scheme 

Define  fcp  =  max(0,/i:,)  and  k~  —  min(0,  k, ),  then  the 
distribution  to  the  nodes  for  the  N  scheme  are  given  by: 

(t>^  =  (ti,  —  «,„) 


At, 


(19) 
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where 


tlin 


(20) 


This  scheme  is  positive  but  not  linearity  preserving.  How¬ 
ever,  among  the  linear  positive  schemes,  it  has  the  lowest 
cross-dilFusion.  It  is  also  the  scheme  with  the  narrowest 
stencil,  hence  the  name  N  scheme. 

From  this  scheme  the  contributions  for  the  non-linear  PSI 
scheme  can  be  constructed  by  limiting  the  N  scheme  dis¬ 
tribution  a-s  follows 


,oPSI,T 

P,  <P 


max(0,^,^) 


c(0,/3f) 


(21) 


This  is  identical  to  Sidilkovers  general  limiter  for¬ 
mula  [13]  when  the  MinMod  function,  ^(r)  =  j(l  + 
sgn(r))  min(r,  1),  is  chosen.  This  scheme  is  both  positive 
and  linearity  preserving. 


The  FV  and  limited  FV  scheme 

For  the  first  order  upwind  finite  volume  (FV)  scheme 
on  the  median  dual  mesh  the  normals  of  the  dual  grid 
are  needed,  see  figure  3.  In  terms  of  these  normals  the 
fluctuation  is  given  by 

—  ka{u3  -  Ui)  -F  kb{u3  -  U2)  +  kc{u2  —  Wi)  (22) 

Depending  on  the  signs  of  the  dot  products  of  the  ad- 


3 


Fig.  3  :  Normals  for  the  triangle  FV  scheme 


vection  vector  with  the  normals  of  the  dual  grid,  4>^  is 
distributed  to  the  nodes  according  to  the  formula 

fiP' 4)^  =  k~{u3  -  ui)  +  k~{u2  -  «i) 
pP'd)'^  =  k~(tt3  -  U2)  +  kt  (u2  -  ni)  (23) 

pP' (iP  -  kt(u3  -  Ul)  +  (l<3  -  «2) 

Again  this  scheme  is  positive,  but  not  linearity  preserving 
and  is  more  diffusive  than  the  N-scheme.  The  limited  sec¬ 
ond  order  version  of  this  scheme  is  obtained  by  applying 
eqn(21)  to  the  distribution  coefficients  ■ 

The  LDA  scheme 

For  the  linear  LDA  (Low  Diffusion  A)  scheme,  the  con¬ 
tributions  are  given  by: 

\  j  / 

This  scheme  is  linearity  preserving,  however  it  is  not  pos¬ 
itive. 


A  J 


kj4 


(24) 


The  Lax-WendrofF  scheme 

The  distribution  coefficients  for  the  classical  Lax- 
Wendroff  scheme  are 

=  5  + 

where  St  is  the  area  of  the  triangle,  and  At  is  the  Lax- 
Wendroff  dissipation  coefficient  with  the  dimension  of  a 
time. 

2.1.2  Schemes  on  quadrilaterals 

The  extension  of  these  schemes  to  quadrilaterals  is  rather 
straightforward.  The  inner  scaled  normals  n,  are  given 
in  figure  4.  The  parameters  k,  are  calculated  as  in 


Fig.  4  :  Quadrilateral  and  inward  normals  n, 


eqn(18).  The  distribution  coefficients  of  the  LDA  and 
Lax-Wendroff  scheme  are  very  similar  to  these  on  trian¬ 
gles  and  will  therefore  not  be  repeated. 

The  quadrilateral  N  and  PSI  scheme 

The  distribution  coefficients  for  the  quadrilateral  N 
scheme  are: 

^kt{u.-u,p  (26) 

Compared  with  the  scheme  on  triangles,  eqn(19),  every 
point  has  his  own  inflow  state,  given  by 

«i.„  =  [-max(|A;i|,  |A:2|)]“‘ 


[(fci  +  I )  u\  +  W3  "f*  k^  U2  +  kj^  U4j 


“2,„  =  [-max(|A:i|,|f;2|)]  ‘ 


[(^2  +  t^I  I)  U2  +  (^4  +  If^ll)  U4  -F  k^  Ul  -F  A,'3  Usj 

(27) 

Again  this  scheme  is  positive  but  not  linearity  preserv¬ 
ing.  The  distribution  coeflicients  for'  the  quadrilateral 
PSI  scheme  are  obtained  by  applying  the  limiter  func¬ 
tion  eqn(21)  to 

The  quadrilateral  FV  and  limited  FV  scheme 

As  on  triangles  the  normals  of  the  dual  grid  are  needed, 
see  figure  5.  The  fluctuation  is 


<i>^  —  kaim  —  Ul)  +  ka(u3  —  U2) 

+  kh(ll2  -  «I  )  +  kl,{tl3  -  u^). 


(28) 
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Fig.  5  :  Normals  for  the  quadrilateral  FV  scheme 


and  the  distribution  to  the  nodes 

—  kaiu4  -  Ui)  +  A:^(u2  -  Wl) 

=  k-{u3  -  U2)  +  k^  (U2  -  Wi )  ^291 

jSP  =  kt  (U3  -  U2)  -f  (u3  -  U4  ) 

=  fc+  (u4  -  «1  )  +  -  114  ) 

The  limited  version  of  this  scheme  is  obtained  as  before. 


2.2  System  distribution  schemes 


3  CONSERVATIVE  LINEARIZATION 


We  now  consider  system  (30)  in  conservative  form 

5U 


au  aF  ac  _ 

dt  ^  dx  ^  dy 


:^+VF  =  0  (34) 


To  maintain  discrete  conservation,  the  cell  residual  has 
to  be  evaluated  as  the  flux  balance  of  the  conservative 
variables  over  the  cell,  for  a  triangle  : 


=  (f  F-(fn„<  (35) 

Jar 

On  the  other  hand,  the  positive  advection  distribution 
schemes  require  a  quasi-linear  form  of  the  residual.  A 
conservative  linearization  is  defined  such  that  the  quasi- 
linear  form  integrated  over  the  surface  is  identical  to  the 
flux  integration  over  the  boundaries  obtained  by  a  par¬ 
ticular  integration  rule.  For  the  Euler  equations  on  trian¬ 
gles,  this  is  easily  achieved  by  assuming  that  the  Roe  pa¬ 
rameter  vector  Z  =  ^  Hp  varies  linearly  over 

each  element.  Since  U,  F  and  G  are  quadratic  in  the 
components  of  Z,  the  Jacobian  matrices  dJJ/dZ,  dFfdZ, 
and  dG/dZ  are  linear  in  the  components  of  Z,  making 
the  integration  over  a  triangle  trivial.  Defining  the  aver¬ 
age  state  Z  over  the  cell: 


.\.s  explained  in  section  1  the  two-dimensional  supersonic 
Euler  equations  can  be  completely  decoupled  and  the 
scalar  schemes  of  the  previous  chapter  can  be  applied. 
However  for  subsonic  flow  the  two  acoustic  equations, 
egn(8).  form  an  elliptic  subset,  which  cannot  be  decou¬ 
pled.  One  way  to  treat  such  a  system  is  to  introduce  the 
coupling  terms  as  source  terms  and  distribute  them  with 
the  LDA  or  La.x-Wendroff  scheme.  By  doing  this  the  pos¬ 
itivity  property  will  be  lost  and  therefore  positive  system 
distribution  schemes  are  to  be  preferred.  Among  the  sys¬ 
tem  distribution  schemes  we  mention  the  Lax-Wendroff 
and  SUPG  distribution.  Recently,  positive  system  dis¬ 
tribution  schemes  have  been  explored,  generalizing  the 
scalar  F'V  and  N  scheme  discussed  before. 

Consider  the  unsteady  hyperbolic  system  of  equations 
given  by 


aw  ,  aw  „  aw 

-F  Aw^ - h  — 

at  ax  ay 


=  0 


(30) 


Extending  the  ideas  of  the  scalar  schemes  we  define  the 
matrices  K,  as 

K,  =  -  (Awni^x  +  Bw^t,y)  (31) 


Z  = 


(y/^  +  +  y/1^ 

\/^U\  -F  yJ^U2  +  y^tis 

-F  y/^V2  -F  y/PiVz 


(36) 

the  flux  balance  over  element  T  may  be  expre.ssed  in 
quasilinear  form  as 


*  F(Z)dny  -  G{Z)dnx  (37) 

JdT 

=  5t  -F  SUj,]  (38) 

where  A  and  B  are  the  analytical  flux  Jacobians  evalu¬ 
ated  at  the  average  state  Z: 


Because  the  system  is  hyperbolic.  A',  can  be  written  as 

A'.  =  A.A.L.  (32) 

where  the  columns  of  A,  contain  the  right  eigenvectors, 
is  a  diagonal  matrix  of  the  eigenvalues  and  L;  =  A~'. 
The  matrices  and  K~  are  given  by 

A',+  =  A,A+L,,  A"  =  A.Ari.  (33) 


Because  the  exact  Jacobians  are  used,  one  can  transform 
(38)  into  any  quasilinear  form  as  long  as  the  transforma¬ 
tion  matrices  are  evaluated  at  the  average  state  Z. 

On  quadrilaterals  it  is  more  difficult  and  for  the  moment 
a  linearization  is  used  which  is  only  exact  for  parallelo¬ 
grams. 

The  global  update  for  the  system,  analogous  to  the  scalar 
case  eqn(14),  is  then  given  by 


Here  Aj  contains  the  positive  and  A“  the  negative  eigen- 
val  ues. 

With  these  definitions  the  system  schemes  on  triangles 
can  be  obtained  just  by  replacing  the  scalar  k,  by  the 
matrix  A',  in  the  equations  (19),  (23),  (24)  and  (25).  On 
c|ua.drilaterals  the  N-scheme  in  the  form  (26)  does  not 
generalize  to  systems  and  only  system  versions  of  the  FV, 
FDA  and  Lax-Weudroff  schemes  can  be  obtained. 


dU, 

dt 


S. 


7  dw  3  .aw 

Aw-7: - F  Bw-7i — 

ax  ay 


=  =  -R(U.) 


(41) 

(42) 


where  is  the  cell  distribution  matri.x. 
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(a)  Structured  grid  for  the  NACA-0012,  32  X  128  cells 


(c)  Entropy  distribution  on  the  airfoil 

Fig.  6  :  Mesh,  Mach  number  isolines  and  Entropy  distri¬ 
bution  on  the  airfoil  for  the  subsonic  NACA-0012 
(A/^  —  0.63,  Q  =  2°),  for  the  hyperbolic/elliptic 
splitting.  PSI  on  entropy  and  total  enthalpy, 
Lax-Wendroff  on  acoustics. 


4  NUMERICAL  RESULTS  USING 
EXPLICIT  TIME  STEPPING 

Results  of  three  inviscid  computations  are  given.  In  fig¬ 
ure  6  the  structured  mesh,  the  Mach  number  isolines  and 
the  entropy  distribution  on  the  aiifoil  are  shown  for  the 
subcritical,  Moo  =  0.63, o  =  2°,  flow  over  a  NACA-0012 
airfoil.  The  scalar  quadrilateral  PSI  scheme  is  used  for 
the  convection  of  entropy  and  total  enthalpy  along  the 
streamlines,  while  the  system  Lax-Wendroff  scheme  is 
used  for  the  coupled  acoustic  subsystem. 

The  unstructured  mesh,  Mach  number  isolines  and  the 
entropy  distribution  on  the  airfoil  for  the  transonic 
NACA-0012,  Moo  =  0.85, a  =  1“,  can  be  found  in  fig¬ 
ure  7.  The  distribution  scheme  is  the  sj’stem  PSI  scheme, 
which  allows  monotonic  capturing  of  the  shock  in  one  or 
two  cells. 

The  third  testcase  is  the  thoughest,  namely  the  hyper¬ 
sonic  {Moo  =  8.7),  axisymmetric  flow  around  a  hyper¬ 
boloid  flare.  The  mesh,  a  triangulated  structured  Navier- 
Stokes  mesh  with  aspect  ratios  over  100,  and  the  Mach 
number  isolines  are  given  in  figure  8.  The  solution  is 
monotonic,  the  shock  is  captured  very  well  and  the  car¬ 
buncle  phenomenon,  seen  in  Finite  Volume  solvers  with 
Roe’s  approximate  Riemann  solver,  is  not  present.  Again 
the  system  PSI  scheme  was  used. 

5  IMPLICIT  ACCELERATION 

Explicit  time-integration  of  the  semi-discrete  equations 
(42),  although  straightforward  and  robust,  suffer  from 
stability  limits  for  some  classes  of  problems,  such  as  sub¬ 
sonic  flows  with  stagnation  regions  and  viscous  flows. 
Implicit  time-integration  is  in  turn  less  limited  by  restric¬ 
tions  over  the  time-step  but  requires  on  the  other  hand 
large  non-linear  systems  of  equations  to  be  solved. 

5.1  Time-stepping  strategy 

As  we  are  only  interested  in  the  steady  state  solution,  we 
restrict  our  attention  to  the  linearized  backward  Euler 
time-stepping  scheme,  which  can  be  written  as: 

Loop  over  time:  (for  fc  =  0, 1, ...)  until  convergence: 

-  Choose  time  increment  A^t, 

-  Compute  increment  A*  as  the  solution  of: 

['KH  +  =  -mu'  ).  (43) 

^ 

-  UPD.yxE:  =  U'  +  ALf' 

where  Jr{U)  —  is  the  Jacobian  of  the  residual 

R{U),  n  sparse  and  non-symmetric  matrix,  and  where 
Jf  denotes  the  augmented  Jacobian  1/A^t  -|-  Jp.  An 
overview  of  different  approaches  to  solve  the  steady  state 
equations  can  be  found  in  [14]. 

At  each  time-step  k,  the  main  ingredients  of  the  algo¬ 
rithm  can  be  listed  as: 

•  computing  a  Jacobian  matrix  Jr(U'}, 

••  solving  the  linear  system  (43), 

•  choosing  a  time  increment  A*^t  and  a  non-linear  up¬ 
date  strategy 

The  three  next  subsections  will  be  devoted  to  the  descrip¬ 
tion  of  each  of  these  tasks. 
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(a)  Grid  for  the  transonic  NACA-0012,  2355  nodes 


(a)  Triangulated  structured  grid.  2470  nodes 


(b)  Mach  number  isolines,  general  view  and  zoom  of  leading  edge 


Fig.  8  :  Mesh  and  Mach  number  isolines  for  the  hyper¬ 
sonic,  axisymmetric  hyperboloid  flare,  Moo  = 
8.7.  Hyperbolic/elliptic  splitting  with  the  system 
PSI  scheme. 


5.2  Jacobian  computation 

5.2.1  Differentiating  the  Residual 

As  the  spatial  discretization  stencil  involves  only 
distance-one  neighbours,  each  individual  component  of 
the  Jacobian  can  be  computed  at  reasonable  cost.  Lim¬ 
iting  the  Taylor  expansion  of  RJUj  -f  slm).  the  nodal 
residual  at  node  i  with  the  m-th  component  of  U  at  node 
j  perturbed  of  a  small  quantity  e,  to  the  first  order  terms, 
one  has: 


aR.(U) 

dVj 


R. (Uj  J-elm)  —  Ri ( U ) 

e 


(44) 


(c)  Entropy  distribution  on  the  airfoil 

Fig.  7  :  Mesh,  Mach  number  isolines  and  Entropy  distri¬ 
bution  on  the  airfoil  for  the  transonic  NACA- 
0012  (Moo  =  0.85, Q  =  1°),  for  the  hyper¬ 
bolic/elliptic  splitting.  System  PSI  scheme. 


It  shows  how  each  entry  of  the  Jacobian  j  ,  a  4  x  4 
matrix  with  m  eis  the  column  index,  can  be  computed 
by  a  first  order  finite  difference.  Because  of  the  compac- 
ity  of  the  scheme,  this  computation  requires  only  twelve 
(twenty  in  3D)  additional  explicit  residual  evaluations. 
Following  the  same  steps  of  the  explicit  solver  (i.e.  loop 
over  the  cells,  in  each  cell  compute  fluctuation  and  dis¬ 
tribute  contributions  to  be  assembled  at  the  nodes),  the 
algorithm  to  compute  the  Jacobian  is: 
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Initialize  R{U)  =  0,  Jr{U)  =  0, 

Loop  over  triangles  (T=l,2,...,  nbr  of  cells): 

O  Compute  fluctuation  and  distribute  contributions 
to  the  3  nodes  (i  =  1,  2, 3)  :  R;  <—  R;  -|-  , 

O  Loop  over  the  3  nodes  of  the  cell  (j=l,2,3): 

O  Loop  over  the  4  components  of  Uj  (m=l,2,3,4): 

-  Perturb  m-th  component  of  U_,  <-  Uj  +  elm, 

-  Compute  new  fluctuation, 

-  Distribute  the  3  contributions  (i=l,2,3): 
rgR.iu)]  I 

Jm^ 

'J'r(U,  +elm)-’^r(U)]  /e 


SR.(U) 

au. 


where  4-  elm)  denotes  the  residual  contribution 

to  node  i  when  the  m-th  component  of  U  at  node  j  has 
been  perturbed. 

A  key  issue  to  the  numerical  computation  of  the  Jacobian 
as  a  finite-difference  approximation  is  the  proper  choice 
of  £,  which  can  be  determined  here  on  a  component-by- 
component  basis.  The  question  is  treated  by  Schnabel[15] 
who  advocates: 


£  =  •/i?max[|Uj.m|,typ(Uj,m)]sign(U_,,m),  (45) 

with  typ(Uj,m  )  a  typical  user-defined  order  of  magnitude 
for  the  m.-th  component  of  U  at  node  j  and  rj  a  lower 
bound  on  the  inaccuracy  in  the  residual  R{U)  evaluation 
(relative  noise).  This  lower  bound  is  at  best  the  machine- 
epsilon  of  the  computer  and  can  be  larger  if  R{U)  is  com¬ 
puted  by  a  lengthy  piece  of  code.  Should  rj  be  worse  or  if 
R{U)  is  not  differentiable  everywhere,  one  might  rather 
le.sort  to  the  secant  method,  known  for  multidimensional 
problems  as  the  Broyden’s  update. 


5.2.2  Broyden’s  method 

Broyden’s  update  method  is  the  multidimensional  exten¬ 
sion  of  the  secant  method  used  for  univariate  problems, 
avoiding  the  need  for  computing  any  derivative.  If  the 
A'th  Newton-Raphson  step  is  denoted*  by: 

Jr{U^)A^U  -RiU^), 

witli  A'^U  =  —  U^,  the  generalization  of  the  one¬ 
dimensional  secant  condition  is  that  satisfies: 

.;«(f/''+‘)A'^l7  =  A'=R,  (46) 

where  A*'/?  =  R{U^'^^)  —  R{U^).  However,  This  does  not 
determine  ■Jr(U^'^^)  uniquely  in  more  than  one  dimen¬ 
sion.  In  Broyden’s  update  approach,  is  chosen 

by  making  the  least  change  (see[15]  for  proper  matrix 
norms)  to  Jr{U'^),  consistent  with  the  condition  (46).  As 
such,  the  method  suffers  a  major  drawback  as  it  entails  a 
complete  fill-in  of  the  Jacobian  matrix  whereas  the  true 
Jacobian  matrix  is  sparse.  Alternatively,  we  can  look  for 
the  solution  to  the  same  least  change  problem  under  the 
additional  condition  B  6  S{Jr)  where  S{Jr)  represents 
the  set  of  17  X  rr  matrices  with  the  same  sparsity  pattern 
as  Jr.  The  resulting  update  is  given  by: 

.;„((/*+' )^jH(f/")  + 

Vsij^){D-'  [A*'R-  J«(t/'')A*t/]  aV}, 

wliere  'Ps^  is  the  matrix  operator  which  maps  any  ma¬ 
trix  onto  the  same  matrix  but  restricted  to  the  sparsity 


time-step  has  been  eluded  from  the  formulation.  However, 
t  he  argumentation  which  follows  still  holds,  as  backward  Euler  dis- 
creJization  in  time  amounts  to  a  classical  Newton’s  method  where 
f  h'.'  increment  has  been  under-relaxed  for  the  update. 


pattern  of  Jr  and  D  a.  n  x  n  diagonal  matrix  which  ac¬ 
counts  for  the  sparsity  structure  of  the  Jacobian  matrix: 


Dii=5'^‘s'l  with  (Si)j  = 


0  if  {Jr).,j  =  0 
(Af)j  otherwise 


(47) 


Broyden’s  method  allows  to  update  the  Jacobian  matrix 
without  having  to  compute  twelve  residual  evaluations. 
On  the  other  hand,  non-linear  convergence  will  be  at  most 
linear  and  more  iterations  will  be  needed  at  the  non-linear 
level. 


5.3  Solution  of  the  linear  system 

Following  the  linearization  process,  the  linear  system  (43) 
is  iteratively  solved  with  left  (or  right)  preconditioning: 

Jf{U'‘)-'Jf{U^)A^  =  R(U^),  (48) 

with  Jf{U^)  obtained  by  some  incomplete  approximate 
factorization  of  Jf{U^).  Block  ILU  factorization  is  used 
in  our  numerical  experiments.  Krylov  subspace  acceler¬ 
ation  techniques  have  been  considered  to  accelerate  the 
convergence  of  the  iterative  solve.  In  the  framework  of 
this  paper,  we  have  favoured  GMRES[16]  among  other 
solvers  because  of  its  optimaJity  and  since  it  does  not 
represent  a  severe  limitation  for  2D  medium  size  prob¬ 
lems  on  today’s  computers  despite  its  storage  require¬ 
ments.  We  refer  to  [17]  for  a  description  and  assesment 
of  alternate  preconditioners  and  other  Krylov  subspace 
techniques,  such  as  QMR  and  TFQMR[18].  A  constant 
Krylov  subspace  dimension  of  30  is  used  in  the  numerical 
experiments  and  the  linear  solver  is  stopped  when  the 
normalized  linear  residual  drops  below  10“®.  This  linear 
convergence  criteria  is  easily  met  within  the  30  Krylov 
subiterations  in  the  early  stages  of  the  convergence  pro¬ 
cess  when  the  CFL  number  is  not  too  large. 


5.4  Global  convergence  and  fixed-point 
method 

The  choice  of  an  optimal  time-step  is  a  key  issue  to  en¬ 
sure  a  fast  and  robust  convergence.  It  seems  logical  to 
increase  the  time-step  when  approaching  to  the  converged 
solution  as  the  likelihood  to  be  within  the  radius  of  con¬ 
vergence  of  the  Newton  method  increases.  Automatic 
time-increment  control  algorithms  have  been  set  up  to 
relieve  the  user  from  explicitly  monitoring  the  CFL  num¬ 
ber  following  the  convergence  level.  Some  experiments 
with  such  algorithms  can  be  found  in  [17].  We  pre.sent 
now  a  technique  which  consists,  after  some  approxima¬ 
tion,  in  accelerating  a  fixed-point  method.  The  technique 
never  reaches  any  Newton-like  convergence,  but  shows, 
for  a  constant  limited  CFL  number,  a  good  global  con¬ 
vergence  behaviour.  The  technique  consists  in  solving  the 
steady-state  Euler/Navier-Stokes  equations  with  an  infi¬ 
nite  CFL  number,  i.e.  full  Newton  time  integration,  but 
using  a  finite  CFL  number  in  the  preconditioning  matrix 
at  each  linearization:  Jf{U^)  —  Jr{U'^)  and  Jf{U'‘)  ob¬ 
tained  as  some  factorization  of  l/A^t  +  Jr.  It  should  be 
pointed  out  that,  since  the  Krylov  subspace  dimension  is 
not  increased,  this  results  also  in  solving  less  accurately 
the  linear  system. 

The  scheme,  already  used  in  [19],  is  building  up  the 
main  features  of  the  flow  at  the  very  early  stages  of 
the  convergence  process  much  faster  than  the  classical 
backward  Euler  discretization  in  time.  Asymptotically 
though,  the  method  shows  a  monotonic  linear  conver¬ 
gence  behaviour  and  never  reaches  the  convergence  rate, 
possibly  quadratic,  of  backward  Euler.  The  method  ap¬ 
pears  therefore  as  complementary  to  backward  Euler  as 
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it  can  be  used  for  the  first  non-linear  iterations  and  pro¬ 
vide,  so  doing,  a.  well-featured  initial  guess  for  backward 
Euler. 

The  scheme  can  be  viewed  as  an  accelerated  fixed-point 
method.  The  basic  implicit  technique  consists  in  a  sim¬ 
ple  relaxation  procedure  immediately  followed  by  a  non¬ 
linear  update.  The  relaxation  procedure  is  based  on  some 
approximation  J f{U^)  of  the  augmented  Jacobian  of  the 
residual  J f{U^  )  =  -f  Jr(U^),  and  the  overtJl  pro¬ 

cess  reads; 

This  formulation  can  be  seen  as  a  particular  case  of  the 
linearized  backward  Euler  time-stepping  where  only  one 
single  iteration  is  performed  at  the  linear  level,  with 
)  as  the  preconditioning  matrix.  Then,  let  us  de¬ 
fine  Q(U)  —  U  —  TiiU)  and  apply  Newton-GMRES  to 
solve  Q(U)  =  0: 

{/''■+'  =  [/'■■  -b  A**'  with  Ja{U’‘)A^  =  -G{U*').  (49) 

If  Jo  is  approximated  by  JJ^Jr  (which  is  only  true  at 
the  non-linear  convergence),  one  has: 

f  ^  jjk  ^  _J-i({7'=)ii({7'=), 

which  is  nothing  else  than  full  Newton  iterations  to  solve 
R{U)  =  0.  where  the  system  arising  at  each  lineariza¬ 
tion  has  been  left-preconditioned  by  In  practice, 

the  technique  amounts  indeed  to  add  1/A^t  only  in  the 
preconditioning  matrix. 

Numerical  experiments  have  shown  that  the  acceler¬ 
ated  fi.xed-point  method  requires  CFL  numbers  of  order 
0(1). 0(10). 


Fig.  9  ;  Subcritical  flow  over  a  NACA-0012:  Iso-Mach 
contours 


5.5  Numerical  results 

.Numerical  results  are  presented  for  subsonic  and  tran¬ 
sonic  vi.scous  computations.  Tests  were  performed  on  a 
DEC  Alpha  AXP  3000/400  workstation.  The  subcritical 
flow  around  a  NACA-0012  airfoil  at  free-stream  Mach 
number  A/cc  of  0.63  and  2°  angle  of  attack  is  first  con¬ 
sidered.  The  grid  is  made  of  5249  cells  with  far- field 
boundary  conditions  located  50  chords  away  from  the 
body.  Iso-Mach  contours  are  depicted  in  Fig.  9.  The 
space  discretisation  used  the  hyperbolic/elliptic  splitting 
model  and  a  detailed  view  of  the  grid  is  shown  in  Fig.  7. 


Implicit  time  integration  was  performed  by  updating  the 
Jacobian  with  Broyden’s  formula,  with  a  maximum  CFL 
number  of  200.  Convergence  history  is  shown  in  Fig.  10 
and  was  achieved  in  about  750  CPU-seconds.  In  com¬ 
parison,  about  40000  CPU  sec  were  needed  to  reduce  the 
residual  to  10“®  using  explicit  Euler  time-stepping. 


N  iterations 


Fig.  10  :  Subcritical  flow  over  a  NACA-0012:  Conver¬ 
gence  history  obtained  with  Broyden’s  up¬ 
date,  750  CPU  sec 


The  second  test  case  is  the  viscous  flow  over  the  same 
airfoil  with  M^o  is  2.0  and  Re  =  106,  which  belongs  to 
the  CAMM  workshop  on  compressible  viscous  flow  solver 
test  suite  ([20]).  Fig.  11  shows  the  density  contours  of 
the  solution  computed  with  the  hyperbolic/elliptic  split¬ 
ting  model  and  convergence  history  is  depicted  in  Fig. 
12.  Convergence  starting  from  a  uniform  flow  field  with 
fixed  point  accelerated  method  for  the  two  first  iterations 
followed  by  backward  Euler,  is  achieved  within  about  12 
iterations  and  350  CPU-seconds.  For  backward  Euler, 
the  inital  CFL  number  of  100  was  increased  at  every  it¬ 
eration  by  a  factor  C2  =  2.0  up  to  10®.  CMRES  with  a 
Krylov  subspace  dimension  of  50  was  used  for  this  test. 


Fig.  11  :  NACA-0012  M^o  =  2.0,  Re  —  106:  Density 
isoline  contours 
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Fig.  12  :  NACA-0012  M=o  =  2.0,  Re  -  106:  Conver¬ 
gence  history,  350  CPU  sec 


6  MESH  ADAPTIVITY 

In  [21],  it  was  proposed  to  use  the  residual  decomposition 
technique  developed  in  the  context  of  multidimensional 
upwind  methods  as  a  tool  to  extend  the  SUPG  method 
to  compressible  flows.  This  idea  was  shown  to  lead  to 
increased  performances  and  robustness  compared  to  the 
standard  system  extensions  of  SUPG  [22,  23]. 

In  the  present  section,  we  report  the  continuation  of  this 
work  with  focus  on  mesh  adaptivity  [24,  25]  and  we  will 
show  that  the  use  of  the  multidimensional  residual  de¬ 
compositions  introduced  to  generalize  the  SUPG  scheme 
to  hyperbolic  systems  allows  for  the  derivation  of  an  er¬ 
ror  estimation  procedure  for  the  Euler  equations  in  a  very 
natural  and  inexpensive  way. 


6.1  SUPG  a  posteriori  error  estimate 

The  main  ingredient  of  the  proposed  error  estimation  is 
the  a  posteriori  error  estimate  developed  by  Johnson  and 
Eriksson  [26,  27]  for  the  SUPG  scheme  applied  to  the 
following  convection-diffusion  equation; 

A  ■  Vu  -  V  •  (kVu)  =  /  in  fi,  (50) 


with  Dirichlet  boundary  conditions  on  the  boundary  E 
of  the  computational  domain  $7.  If  we  assume  that  the 
advection  vector  A  is  constant,  the  a  posteriori  error  esti¬ 
mate  for  the  scalar  shock  capturing  SUPG  scheme  applied 
to  the  stationary  problem  (50)  can  be  written  from  [27] 
as: 

||d-l'||/.2(n)  <  C||min(l,«“‘/j^)/J(U)|U2(rj)-finaxK^/*  , 

(51) 

where 


R{U)  =1  \-VU-f  I  -f  max 
seercn 


.  dU 
dns 


j  h  on  T  G  T  , 


(52) 

with  T  a  triangle  of  mesh  T,  k  the  artificial  viscosity  of 
the  SUPG  scheme  and  ns  the  normal  to  side  S  of  T.  Note 
that,  for  simplicity,  the  computed  solution  U  is  compared 
with  the  .solution  «  of  a  perturbed  continuous  advection- 
diffusion  problem  obtained  by  replacing  k  by  k{U)  in  eq. 
(50).  In  general,  ||u  —  u||  is  expected  to  be  dominated 
by  Cllti  —  U|j,  where  G  is  a  constant,  so  that  control  of 
|(rt  suffices. 


6.2  Extension  of  the  error  estimation  to 
the  Euler  equations 

Once  we  are  equipped  with  such  a  reliable  and  efficient 
error  estimate,  it  is  quite  natural  to  apply  this  error  esti¬ 
mate  to  each  individual  scalar  equation  resulting  from  a 
residual  decomposition  step  as  described  in  section  1 .  Let 
4>k  be  the  scalar  fluctuation  corresponding  to  equation  k 
and  contribution  of  triangle  T  to  this  fluctua¬ 

tion.  We  consider  then  the  following  adaptive  algorithm: 
Given  a  tolerance  TOL  and  an  initial  triangulation  To,  de¬ 
termine  successively  triangulations  7)  with  Nj  elements, 
mesh  spacings  hj  =  hj((i>k)  and  corresponding  approxi¬ 
mate  solutions  f/j,  (i  =  1, . . . ,  J),  such  that  hj  is  maximal 
under  the  local  condition,  for  k  =  1, ...  4; 


I,  -  /■,  ~-i  i2  ^d'fc(f^-l)l 

ChjW  min(l,«;_,_i/ij_i) - - 1 


< 


TOL 

\/^ 


(53) 


on  T  G  7)-i  until  (on  the  final  mesh)  the  global  norm  be 
such  that: 

C||min(l,K7‘hj)  <  TOL  (54) 

which  is  the  stopping  criterion.  Notice  that  (53)  seeks  to 
equidistribute  the  contribution  from  each  element  to  the 
global  error  bound. 

From  the  adaptivity  criterion  (53),  one  can  isolate  hj  for 
each  triangle  T  which  provides  us  with  a  new  "reference" 
size  for  each  triangle.  Then,  it  is  easy  to  decide  whether 
a  given  triangle  has  to  be  refined,  coarsened  or  kept  as  it 
is.  Of  course,  when  dealing  with  the  2D  Euler  equations, 
one  can  compute  four  different  required  mesh  sizes  hj{4>k) 
for  the  next  triangulation  7).  At  that  point,  several  op¬ 
tions  can  be  taken.  It  could  be  decided  for  instance  to 
control  the  error  only  on  one  of  the  4  equations  but  this 
is  risky  because  one  could  miss  some  of  the  flow  features 
which  are  not  "seen"  by  the  corresponding  variable.  Our 
prefered  choice  therefore  consists  in  taking  the  minimum 
of  the  four  mesh  sizes, 

hj  =  min  hj{(j)k) ,  (55) 

/c=l,...4 


which  ensures  an  equation- by-equation  control  of  the  er¬ 
ror  over  the  mesh  under  the  required  tolerance  TOL. 


6.3  Adaptivity  technique 

The  adaptivity  technique  developed  in  the  present  re¬ 
search  is  inspired  by  the  innovative  work  of  Richter  [28]. 
It  consists  in  non-hierarchical  h-refinement/derefinement 
allowing  efficient  mesh  optimization  operations  such  as 
edge  swapping  and  Laplacian  smoothing. 

The  refinement  operation  is  achieved  by  the  introduc¬ 
tion  of  an  additional  node  for  each  edge  of  an  element 
for  which  the  calculated  spacing  is  less  than  the  element 
parameter  h.  For  interior  edges,  the  additional  node  is 
placed  at  the  mid-point  of  the  edge  and  the  solution  at 
the  new  vertex  is  interpolated  from  the  solution  at  the 
extremities,  whereas  for  boundary  edges,  the  geometrical 
location  of  the  new  node  is  determined  through  a  spline 
interpolation  involving  the  four  closest  existing  points. 
For  any  edge  that  is  subdivided  in  this  manner,  the  two 
adjacent  triangles  associated  to  this  edge  both  have  to  be 
divided  in  order  to  preserve  the  consistency  of  the  final 
grid. 

Our  coarsening  strategy  is  based  on  the  use  of  a  non- 
hierarchical  data  structure  which  enables  the  deletion  of 
nodes  of  the  initial  grid  and  the  use  of  the  structural  op¬ 
timization  techniques  described  below.  The  coarsening 
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is  achieved  in  two  steps.  First,  given  the  set  of  elements 
flagged  to  be  deleted,  a  list  of  nodes  to  remove  is  con¬ 
structed.  Then,  the  deletion  of  these  nodes  is  performed 
simultaneously  with  the  reconnection  of  the  remaining 
nodes  to  obtain  a  conformal  mesh.  This  is  done  by  iden¬ 
tifying  each  element  involved  in  the  coarsening  with  one 
of  the  three  possible  derefinement  cases:  triangles  with 
1,2  or  3  nodes  to  be  deleted  (see  fig.  13). 


T«p«3,  T»p«J:  Typt^i  T)IP*V 


sociated  treatments 


After  the  adaption  step  itself,  a  series  of  mesh  optimiza¬ 
tion  operations  are  performed  in  order  to  improve  the 
quality  of  the  grid.  The  first  operation  consists  in  a  stan¬ 
dard  Laplacian  smoothing  modified  in  order  to  reduce  the 
clustering  around  nodes  with  degree  lower  than  6  and  the 
dispersion  of  nodes  around  nodes  with  degree  higher  than 
6.  The  second  operation  is  an  edge  swapping  procedure 
which  aims  at  minimizing  the  number  of  nodes  with  a 
high  degree.  This  increases  the  number  of  nodes  which 
have  an  optimal  degree.  The  final  operation  consists  in 
setting  a.  minimum  value  to  the  degree  of  the  nodes  by 
removing  undesirable  low  degree  configurations  as  shown 
in  fig.  14. 


Fig.  14  :  Three  "pathological"  low  degree  node  configu¬ 
rations  and  their  associated  treatments 


For  more  details  about  the  adaptivity  technique  we  refer 
llie  reader  to  [24], 


(b)  Mach  number  isolinea 

Fig.  15  :  Mesh  adaptivity  for  transonic  NACA-0012 
(Mto  =  0.85,0  =  I”),  Scalar  shock-capturing 
SUPG  scheme  associated  with  the  hyper¬ 
bolic/elliptic  splitting,  TOL^O.IO 
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6.4  Numerical  results 

The  transonic  flow  around  a  NACA-0012  airfoil  (Moo  = 
O.S5,q  =  1°)  is  computed.  The  initial  mesh  is  a  coarse 
triangulation  with  587  nodes  and  1094  elements  obtained 
with  the  frontal  Delaunay  method  by  Muller  et  al.  [29]. 
d'he  constant  C  appearing  in  (53)  was  chosen  equal  to  1, 
the  tolerance  level  TOL  was  fixed  at  TOL  =  0.10  and  the 
error  estimation  was  performed  on  all  equations.  Three 
adaption  steps  have  been  achieved  before  meeting  the 
stopping  criterion.  Fig.  15  shows  the  final  mesh  as  well  as 
the  Mach  number  isolines  of  the  corresponding  solution. 
The  final  mesh  (fig.  15a)  indicates  clearly  that  all  features 
of  the  flow,  i.e.  the  stagnation  zone  and  expansions  near 
the  leading  edge,  the  two  normal  shocks  and  the  slip  line 
emanating  from  the  ttailing  edge  have  been  detected. 
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1.  SUMMARY 

Genuinely  multidimensional  upwind  dissipation  models  are 
developed  for  the  2D/3D  Euler/Navier-Stokes  equations  using 
a  cell-centered  finite-volume  approach  on  structured  grids. 
The  numerical  flux  is  formulated  using  the  artificial 
dissipation  concept.  An  overview  is  given  for  2D/3D  compact 
upwind  dissipation  for  stencils  up  to  respectively  6  and  8 
points.  A  classification  is  set  up  for  first  and  second  order 
accurate  schemes  that  have  respectively  minimum  and  zero 
cross  diffusion.  Second  order  monotone  schemes  are 
developed  using  the  concept  of  non-linear  limiter  functions 
applied  on  multidimensional  ratios  of  flux  differences.  A 
classification  is  presented  for  different  families  of  2D  ratios. 
3D  multidimensional  limiters  based  on  3D  ratios  of  flux 
differences  are  introduced.  The  scalar  dissipation  models  are 
extended  and  applied  to  the  Euler/Navier-Stokes  equations 
based  on  a  characteristic  decomposition  of  the  inviscid 
operator.  The  resulting  characteristic  compatibility  equations 
consisting  of  convective  and  source  terms  are  depending  on  a 
set  of  3  propagation  directions.  An  overview  is  given  for 
different  choices  of  directions.  The  multidimensional 
discretisation  is  considered  for  both  the  convective  and  source 
terms  along  its  associated  advective  speed. 

2.  INTRODUCTION 

In  the  last  ten  years  extensive  research  has  been  ongoing 
towards  the  development  of  genuinely  multidimensional 
upwind  schemes.  The  main  motivation  is  to  reduce  the  mesh 
dependency  appearing  in  classical  dimensional-split  schemes 
and  as  a  result  to  capture  the  physics  more  accurately.  Two 
main  approaches  are  found  in  literature:  the  fluctuation 
splitting  schemes  and  the  finite  volume  schemes,  for  a  review 
see 

The  fluctuation  splitting  schemes  consist  of  an  upwind 
distribution  of  a  fluctuation  (residual)  over  the  nodes  of  a 
triangular  or  tetrahedral  cell^’^”'^’'^-'^’-'’^-  In  the  finite 
volume  methods  the  numerical  flux  is  determined  using 

multidimensional  extrapolation''®’^. i-‘>. 24, 2.S  Application  of 

both  methods  to  the  Euler/Navier-Stokes  equations  consists  of 
two  basic  elements  :  (1)  a  suitable  wave  modelling'*''^  or 
characteristic  decomposition*’'^-'^''^  of  the  inviscid  operator 
and,  (2)  a  scalar  convection  scheme. 

The  concept  of  artificial  dissipation  associated  to  central 
schemes,  became  a  key  element  in  Euler  and  Navier-Stokes 
calculations  during  the  last  15  years.  The  family  of  upwind 
schemes,  which  can  be  considered  as  a  rational  way  of 
defining  dissipation  in  a  numerical  algorithm,  has  led  to  a 
matrix  dissipation  form,  as  opposed  to  scalar  dissipation'-''-®. 
One  of  the  essential  elements  of  the  upwind  dissipation  is  the 
concept  of  non-linear  limiters,  leading  to  high  resolution,  2nd 
order  schemes,  satisfying  some  condition  of  monotonicity. 


such  as  the  one-dimensional  concept  of  ‘Total  Variation 
Diminishing’  TVD  schemes. 

Very  recently  a  more  formal  approach  towards  a  general 
formulation  of  artificial  dissipation  terms,  applicable  to 
structured  as  well  as  unstructured  meshes  is  developed,  based 
on  the  concept  of  Local  Extremum  Diminishing  (LED) 
schemes,  by  way  of  generalised  limiters'''.  All  these 
developments  however  still  remain  in  the  dimensional 
splitting  approach. 

In  this  framework,  2D  multidimensional  upwind  schemes  have 
been  reformulated  as  a  way  of  defining  dissipation  terms,  with 
the  requirements  of  positivity  and  classical  limiter 
concepts®’-" In  contrast  to  the  dimensional-split  models,  the 
multi-D  dissipation  depends  on  the  direction  of  the  convection 
speed  and  on  variations  of  the  solution  or  fluxes  in  different 
mesh  directions.  The  corresponding  numerical  flux  for  a  cell 
face  is  determined  by  a  multidimensional  interpolation  inside 
an  upstream  triangle.  As  a  result  the  multi-D  dissipation  is 
more  compact  than  the  models  with  a  one-dimensional 
interpolation  along  the  mesh  lines.  Recently  a  comparison  and 
unification  was  performed  for  the  underlying  scalar  linear  and 
non-linear  positive  convection  schemes  for  both  the  finite 
volume  and  fluctuation  methods'  '■22.-'>2^ 

The  idea  of  multidimensional  limiters  was  first  introduced  for 
a  2D  scalar  convection  problem^".  Different  classes  of  2D 
limiters  have  been  classified  and  applied  to  the  2D  Euler 
/Navier  Stokes  equations®’'®’^ In  the  present  paper  an 
overview  is  given  for  compact  2D  convection  schemes  for 
stencils  up  to  6  points.  Different  classes  of  2D  ratios  are 
determined  by  the  choice  of  i)  a  triangular  interpolation 
domain  and  ii)  variations  along  meshlines  or  diagonals.  The 
analysis  is  extended  for  3D  convection  schemes  as  basis  for 
the  development  of  dissipation  models  including  3D  limiters 
and  ratios.  A  classification  is  given  concerning  first  and 
second  order  schemes  with  respectively  minimum  and  zero 
cross  diffusion  for  stencils  up  to  8  points. 

The  scalar  dissipation  models  are  extended  to  the 
Euler/Navier-Stokes  equations  based  on  a  characteristic 
decomposition-®'^  of  the  inviscid  operator.  The  resulting 
characteristic  compatibility  equations  represent  the  convection 
of  an  entropy,  a  shear  and  2  acoustic  waves.  They  consist  of 
convective  and  source  terms  that  depend  on  a  set  of  3 
propagation  directions.  An  overview  is  given  of  different 
strategies  concerning  the  choice  of  the  directions.  The 
resulting  equations  are  discretized  using  the  scalar  dissipation 
models.  The  multi-D  dissipation  models  are  considered  for 
both  the  convective  and  source  terms  based  on  its  associated 
advective  speed. 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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3.  DIMENSIONAL-SPLIT  UPWIND  DISSIPATION 
3.1  Numerical  flux  formulation 

Consider  3D  scalar  hyperbolic  conservation  law  based  on 
the  fluxes  l'=(f,g,h), 

^  +  ^.f  =  ^  +  =  0  (1) 

dt 

with  a={a,b,c)=(df/du,dg/du,dh/du)  the  convection  speed.  A 
cell-centered  conservative  finite-volume  semi-discretization  of 
(1)  on  a  Cartesian  mesh  yields, 

*^^1,1, k  *i+l/2,j.k“Vl/2.i,k 

- ^  ^ 

dt  Ax 


3.2  Monotonicity  condition 

To  prevent  oscillations,  L  is  limited  by  use  of  a  non-linear 
limiter  function  (P  ,  in  order  to  fulfil  monotonicity  conditions. 
It  assures  that  local  maxima  can  not  increase  and  local  minima 
can  not  decrease.  The  approach  in  ref.  is  used  and  is 
recently  defined  as  Local  Extremum  Diminishing  (LED) 
condition''^.  Rewriting  the  residual  of  (2), 


du. 


i,i.k 

dt 


=  -  Res.  ,  =  Z  c, 

i,|,k  ^  Imji 

i.ifl.ii 


^*^i+l.j+m.k+n 


(7) 


the  positivity  condition  is  defined  by 


^  0  V  l,m,n 


(8) 


,  "i,j+l/2.k~gi,j-l/2,k  ,  ^i.j,k+l/2  ^''i,j.k-l/2_r, 

H  y/.) 

Ay  Az 

where  e.g.  the  numerical  flux  on  cell  face  i-i-l/2,j,k  is 
expressed  by 

*i+l/2,j,k  ~  2^  ^i+i  j.k  ^  ~ 

consisting  of  a  central  part  being  the  average  of  the  fluxes  in 
the  cell-centers  left  and  right  to  the  cell  face.  The  numerical 
dissipation  on  cell  face  i-rl/2,j,k  is  represented  by  di+|/2_j,k- 
All  classical  central  and  upwind  dimensional-split  dissipation 
models  can  be  formulated  as  a  function 


'^i+l/2.j.k  (■■•’^‘^i-l/2,j,k’^Vl/2j,k’^^i+3/2,j,k’-'^ 


(4) 


depending  on  1 D  consecutive  differences  of  the  solution  along 
the  corresponding  mesh  line  with  e.g.  8ui^.i/2_j,k=tii-i-l,j,k  ' 
Uj  j  k-  For  example  consider  the  1st  and  2nd  order  upwind  Flux 
Difference  Splitting  schemes  (FDSI,FDS2)  where  the 
dissipation  (4)  is  specified  by,  for  a  >  0, 


The  first  term  in  (5)  is  a  diffusive  contribution  and 
corresponds  to  first  order  upwinding  (FDSl ,  L=0).  Function  L 
represents  an  antidiffusive  term  that  introduces  higher  order 
accuracy, 

L=  ^  |a|  5Uj+l/2,j.k  ’^’'i+l/2,j,k^ 


with  flux  limiter  tp  depending  on  a  ID  ratio  based  on  the  sign 
of  a  as  shown  in  figure  1 , 


with 


6u 


r. 


i-l/2,j,k 


i+l/2,j,k 


8u 


i-n/2,j,k 


{6b) 


Figure  1  Second  order  dimensional-split  upwind 
dissipation  (a>0). 


For  scheme  (5)-(6)  this  is  fulfilled  if  the  flux  limiter  <P 
satisfies 

I  ^(r) 

0  <  <l>(r)  <  min  (  2, 2r ),  ‘t’  ( y  )  —  (9) 

Conditions  (9)  are  valid  for  all  classical  TVD  limiters.  In  ID 
the  monotonicity  concept  is  equivalent  with  the  Total 
Variation  Diminishing  (TVD)  approach. 

4.  MULTIDIMENSIONAL  UPWIND  DISSIPATION 
4.1  2D  Scalar  Upwind  Dissipation 

In  the  following  an  overview  is  given  for  compact  2D  scalar 
upwind  schemes,  including  linear  and  non-linear  classes 
having  first  and  second  order  monotone  schemes.  This  study  is 
based  on  a  theoretical  analysis  of  2D  linear  convection 
schemes  of  which  the  basic  elements  are  in  ref.®  It  is  a 
generalisation  of  the  analysis  of  ref.^"^  where  only  first  order 
optimum  schemes  are  considered.  This  general  study  is  set  up 
for  cell-centered  molecules  with  a  finite  volume  and  structured 
approach.  It  is  based  on  a  general  9-point  stencil  that  is 
derived  in  cartesian  and  streamline  coordinates.  Conditions 
concerning  second  order  accuracy,  cross  diffusion, 
monotonicity  and  relations  between  some  of  these  are 
investigated. 

In  contrast  to  the  classical  dimensional-split  dissipation,  the 
multidimensional  upwind  dissipation  is  based  on  variations  of 
the  solution  in  different  mesh  directions  and  on  the  total 
convection  speed  (a.b).  The  domain  of  dependence  of  the 
multi-D  upwind  dissipation  models  is  taken  in  an  upstream 
direction  to  the  convection  speed. 

In  the  following,  the  linear  form  of  (I)  is  considered  on  a 
uniform  mesh  with  mesh  spacing  Ax=Ay=l.  The  fluxes  are 
f=au  and  g=bu  with  constant  convection  speeds  a,b>0.  A 
general  form  of  the  multi-D  upwind  dissipation  on  face 
i  4-1/2,],  can  be  written  as 

di+i/2.j  =  2la|SUi+i/2,j  +  L(5u^,5uy)  (10) 

The  first  term  is  the  classical  first  order  upwind  dissipation 
from  equation  (5).  The  second  term  L  represents  the  multi-D 
dissipation  that  is  function  of  differences  of  the  solution  in 
both  mesh  directions. 

Two  options  have  been  investigated  for  the  choice  (5ux,8uy) 
in  (10)  leading  to  compact  2D  upwind  schemes,and  are 
illustrated  in  figure  2.  Both  variations  in  x  and  y-  direction 
determine  a  triangular  domain  of  dependence  for  the  multi-D 
artificial  dissipation. 
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4.1.1  Linear  6-point  upwind  schemes 
For  the  linear  subclass  of  (10)  the  numerical  flux  is 
determined  by  a  linear  interpolation  in  the  corresponding 
triangles  of  figure  2,  e.g.  for  configuration  (1), 


I - 1 - T 


r - 1 - n 


Figure  2  Interpolation  domain  I  and  II  for  the  scalar 
multi-D  upwind  dissipation  (a,b>0) 


Configuration  (1)  (figure  3a)  consists  of  only  3  triangles 
because  the  interpolation  domain  for  cell  faces  i-l/2,j  and  i,j- 
1/2  are  identical.  Configuration  (II)  in  figure  3b  consists  of  4 
triangles  where  the  continuously  shaded  areas  are  referring  to 
cell  faces  i±l/2,j. 

Both  families  of  schemes  (figure  3)  have  subclasses  of  2nd 
order  accurate  schemes,  illustrated  in  figure  4.  The  subclass  of 
zero  cross  diffusion  schemes  or  second  order  accurate 
schemes  for  the  homogeneous  convection  equation,  is  a  two 
parameter  family  for  both  options  1  and  II.  A  comparative 
study  performed  between  the  fluctuation  splitting  unstructured 
multidimensional  upwind  schemes  of  Deconinck  and  co¬ 
workers,  ref  ^  and  the  present  dissipation  models  is  performed 
in  ref  It  shows  that  the  Low  Diffusion  schemes  A  and  B 
(LDA  and  LDB)  are  5-point  zero  cross  diffusion  schemes  of 
the  6-point  family  of  configuration  (II)  (figure  3b). 

The  5-point  continuous  interpolation  scheme  (config.  I) 
mentioned  in  figure  4  is  investigated  in  ref  and  is 

based  on  a  continuous  interpolation  for  the  numerical  flux 
inside  the  polygon  formed  by  the  6  surrounding  cell  centers  of 
the  cell-face. 


Cife,)  =  "^i,j  -  “  5"i,j-i/2  +  P  2"i-n/2,j-i 

Using  the  definition  of  the  numerical  flux  (3),  the  multi-D 
upwind  dissipation  model  is  determined  from  (11), 

/2,j  =  " |S“i+l /2,j  +  “  SUj  -  P  /2 j.  1  ( 1 2) 

with  positive  interpolation  coefficients  a  and  p  depending  on 
a  and  b.  Similar  formulas  are  valid  for  the  fluxes  on  cell  faces 
i,j±l/2  introducing  analogue  coefficients  §  and  y. 

Interesting  to  notice  is  the  sign  of  the  multi-D  contributions  in 
(12).  The  term  based  on  coefficient  P  and  defined  in  the  same 
direction  as  the  1st  order  term  reduces  the  dissipation  as  for 
classical  higher  order  schemes.  While  the  term  depending  on 
a  in  the  other  mesh  direction  increases  the  dissipation  (12). 
This  addition  of  dissipation  is  not  a  loss  of  accuracy,  on  the 
contrary  it  reduces  the  diffusion  in  the  cross  flow  direction  as 
shown  in  ref.'®. 

Writing  out  the  residual,  the  resulting  6-point  families  are 
determined  by  3  parameters  A=a-t-5,  p  and  y.  Figure  3  shows 
for  both  configurations  the  interpolation  triangles  for  the  four 
cell  faces. 


A  more  severe  constraint  is  the  condition  for  general  second 
order  accuracy,  in  the  classical  sense,  defining  a  unique 
member  of  the  class  of  compact  zero  cross  diffusion  second 
order  schemes  for  the  non-homogeneous  convection  equation. 
Notice  that  for  the  configuration  I  the  scheme  is  an  upwind 
scheme  while  for  configuration  II  the  scheme  is  the  classical 
central  scheme 

4.1.2  Linear  4-point  upwind  schemes 
Both  6-point  families  have  a  subclass  of  4-point  stencils  in 
common  with  the  choice  of  P=T=0  in  (12)  and  figure  3, 
yielding  the  numerical  dissipation  for  a,b>0 

^i+I/2,j  ~  (  ^^i,j-l/2  )  (13) 

with  L(5u.._,^)  =  a5a._,/2 

The  resulting  4-point  schemes  are  actually  a  one-parameter 
(A=a-t-5)  family  of  schemes,  although  the  parameters  a  and  8 
can  be  chosen  independently.  The  general  4-point  scheme  is 
splitted  in  a  central  part  and  dissipation  term: 


(b)  config.  (II) 


Figure  3  Six-point  linear  schemes,  config.  I  and  II 
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Res. .  =  ^  (  a  5  +  b  5  ]u. .  -  D. .  (14) 

where  the  upwind  dissipation  is  formulated  as 

a.  =  i(a5:5;  +  b5;6;  +  2A5;5;)m.  (15) 

by  use  of  S.  S+and8'  that  represent  respectively  central, 
forward  and  backward  differences.  The  first  2  terms  in  (15) 
represent  the  1st  order  dimensional-split  upwind  dissipation. 
The  additional  mixed  2nd  difference  term  represents  the 
multidimensional  dissipation.  The  parameter  A,  determines 
the  amount  of  multidimensional  upwind  dissipation. 


Several  interesting  schemes  are  recovered  by  choosing  a 
particular  value  of  A  as  illustrated  in  figure  5.  Concerning  the 
subclass  of  4-point  monotone  first  order  schemes,  the  lower 
limit  (A=0)  corresponds  to  the  1st  order  classical  upwind 
scheme  that  has  maximum  cross  diffusion  (FDSl).  The  upper 
limit  (A=min(a,b))  corresponds  to  the  minimum  cross 
diffusion  scheme.  This  scheme  has  been  investigated  in 
different  formulations,  ref.^’^’^®’^^’^^.  The  4-point  family 
has  a  unique  non  monotone  zero  cross-diffusion  scheme 
being  second  order  accurate  for  the  homogeneous  convection 
equation,  ref.^’^^.  Since  this  scheme  is  a  linear  second  order 
scheme  it  can  not  be  monotone  and  shows  oscillations  near 
discontinuities. 


Figure  4  Overview  2nd  order  6-point  upwind  schemes 


Figure  5  2D  4-point  upwind  schemes 
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For  more  details  about  the  2D  convection  schemes  and 
dissipation  including  4-point,5-point  and  6-point  schemes,  see 
e.g.  ref.'^’-' 


In  ref.  monotonicity  conditions  on  the  coefficients  in  (17) 
are  derived  for  the  4  classes.  All  2D  ratios  found  in  literature 
e  g  2.';,26,27,.t2  fjj  gigjggg 


4.1.3  2D  Multidimensional  Limiters 

The  idea  of  multidimensional  limiters  was  first  introduced  by 
Sidilkover,  ref.  in  the  framework  of  a  scalar  convection 
problem.  Hirsch  and  Van  Ransbeeck,  ref.^’ *  * 

considered  various  multidimensional  dissipation  formulations 
ba.sed  on  positivity  and  classical  limiters.  To  illustrate  the 
methodology  the  unique  4-point  zero  cross  diffusion  scheme 
(fig. 5)  is  considered  below. 

The  definition  of  multidimensional  limiters  follows  the  ID 
methodology.  Starting  with  a  1st  order  monotone  scheme, 
limited  antidiffusive  terms  are  added.  One  of  the  main 
differences  with  the  dimensional-split  limiters  is  that  as  1st 
order  scheme  the  minimum  cross  diffusion  scheme  from  fig. 5 
is  selected,  having  a  higher  accuracy  than  the  classical  1st 
order  scheme,  e.g.  ref.^ ' .  The  2nd  order  dissipation  is 
rewritten  as  the  first  order  dissipation  plus  anti-diffusive 
limited  correction  term, 

e3.re2,+  L(5u,,5u^)  (16) 

with  =  Aa5u..|^2  (D(r.^l/2j) 

Aa  =  a®-a<'t  =  -^  ( b  - min(a,b) ) 

where  Aa  represents  the  difference  in  interpolation  coefficient 
between  2nd  and  1st  order  scheme.  Near  discontinuities  the 
limiter  is  switched  off  (<1>=0)  and  the  1st  order  multi-D 
dissipation  is  applied.  In  smooth  regions  O  =1  and  then  the 
linear  2nd  order  scheme  is  applied. 

The  definition  of  the  multi-D  ratio  in  (16)  and  the 
corresponding  variations  5ux  and  5uy  are  related  to  the  choice 
of  triangular  interpolation  domains  1  or  II  from  figure  2.  The 
following  definition  of  general  2D  ratio  is  used, 

c,  5u  +  c,  5u 

_  '  X  A  y 


and  is  related  to  the  choice  of  a  triangular  interpolation.  Two 
triangle  configurations  are  shown  in  figure  6.  For  each 
configuration  two  options  are  considered  when  fixing  5uy  and 
with  the  numerator  of  (17)  taken  as:  a  variation  along  x- 
direction  or  a  variation  along  the  diagonal. 


4.2  3D  Scalar  Upwind  Dissipation 

In  the  following  a  brief  overview  is  given  of  a  theoretical 
analysis  of  3D  linear  convection  schemes  of  which  the  basic 
elements  are  in  ref.  and  which  will  be  discussed  in  more 
details  elsewhere.  This  study  is  based  on  the  extension  of  the 
2D  analysis  discussed  in  section  4.1.  A  general  form  of  the 
multi-D  upwind  dissipation  on  face  i-rl/2,j,k,  similar  to  (10) 
can  be  written  as 

di+l/2,j.k  =  il^l  5“i+l/2,j,k  +  L  (  S“x-  )  ^8) 

The  first  term  is  the  classical  first  order  upwind  dissipation 
from  equation  (5).  The  second  term  L  represents  the  multi-D 
dissipation  that  is  function  of  differences  of  the  solution  in  the 
three  mesh  directions. 


4.2.1  Linear  8-point  upwind  schemes 
In  the  following,  the  linear  form  of  (1)  is  considered  on  a 
uniform  mesh  with  mesh  spacing  Ax=Ay=Az=l.  The  fluxes 
are  f=au,  g=bu  and  h=cu  with  constant  convection  speeds 
a,b,c>0.  The  8-point  molecules,  for  a,b,c>0  are  defined  by  the 
following  extrapolation  formula  on  e.g.  face  i+l/2,j,k, 

Ci/2.j,i<  =  -  ^2)  -  Px(^o  -  “4)  -Tx(U2  -  Ua)  (19) 

referring  to  figure  7  for  the  overall  configuration  of  the 
scheme.  Using  the  definition  of  the  numerical  flux  (3),  the 
multi-D  upwind  dissipation  model  (18)  is  determined  from 
(19), 

‘^i+l/2,jk  “  ■jl®l^^i+l/2.jk 

+  L(8Uj  j_|y2  |j,  5Uj  jy^  1/2,  6U|  j  11^1/2)  (20) 


L  -  «x5Uj_j_,/2  +  Px^^i,j,k-l/2  %!^^i,j-l,k-l/2 

with  interpolation  coefficients  ttx,  Px  3nd  Yx  depending  on  a, 
b  and  c.  Similar  formulas  are  valid  for  the  fluxes  on  cell  faces 
i,j+l/2,k  and  i,j,k±l/2  using  respectively  the  sets  (ay,  Py.Yy) 
and  (a^,  Pz,Yz). 


Figure  6  Four  classes  of  2D  multi-d  ratios 
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Diju  =  I  (a§:5;  +  b5;5-  +  c5:8;)u.,^ 

4-  ( A  8;  5;  .  B  5^  5- .  C  6;  5- D  5;  5;  8-)u,. ,  (23) 


Figure  7  3D  upwind  schemes  for  8-point  stencils 

It  is  important  to  observe  that  the  resulting  molecule  actually 
represents  a  four-parameter  family  of  schemes  when  the 
following  set  of  parameters  is  chosen 

A  =  +  cx^  + 

B  =  P,  +  P. 

C  =  Px  +  Tx  +  “.  +  X. 

D  =  Yx  +  Yy  +  \  (21) 

Based  on  (19)-(21)  the  following  general  8-point  scheme  is 
recovered  which  is  splitted  in  a  central  part  and  dissipation 
term: 

Res..^=lfa5  +b8  +c8k.,-Dj.,  (22) 

i.i,k  2  V  X  y  'i  '•'■s 

where  the  upwind  dissipation  is  formulated  as 


The  first  3  terms  in  (23)  represent  the  1st  order  upwind 
dissipation.  The  additional  terms  represent  the 
multidimensional  dissipation  which  consists  of  mixed  2nd  and 
3rd  differences.  The  parameters  and  D  determine  the 

amount  of  multidimensional  upwind  dissipation.  Choosing  a 
specific  scheme  corresponds  to  fixing  the  4  parameters  A,B,C 
and  D.  Since  the  cell  face  values  are  determined  by  9 
interpolation  coefficients,  every  scheme  has  5  degrees  of 
freedom  in  choosing  the  interpolation  coefficients  in  (21). 

4.2.2  First  order  monotone  schemes 

Figure  8  shows  a  classification  for  the  8-point  family  of 
upwind  schemes  including  monotonicity  and  zero  cross 
diffusion  conditions.  The  lower  limit  of  the  monotonicity 
condition  corresponds  with  the  classical  first  order  upwind 
scheme  with  a  maximum  amount  of  cross  diffusion.  The  upper 
limit  corresponds  to  the  minimum  cross  diffusion  scheme  also 
identified  in  ref.^^.  In  the  2D  case  (e.g.  c=0)  this  scheme 
reduces  to  the  2D  minimum  cross  diffusion  scheme  of  fig.  5 
investigated  before  in  e.g.  ref.^'^A2.‘5_  Different  interpolation 
strategies  are  investigated  in  ref. 

4.2.3  Subclass  of  zero  cross  diffusion  schemes 
Evaluating  the  zero  cross  diffusion  condition  in  fig.8  one  can 
notice  that  there  is  no  condition  on  parameter  D.  As  a  result 
there  is  a  one-parameter  subclass  of  zero  cross  diffusion 
schemes.  Different  interpolation  strategies  in  (20)  lead  to 
different  values  of  D.  The  arithmetic  average  procedure 
corresponds  with  the  scheme  used  by  Roe  and  Sidilkover  as 
starting  point  in  their  theoretical  analysis  for  first  order 


Figure  8  Classification  of  3D  upwind  schemes  for  8-point  stencils 
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optimum  linear  schemes  in  two  and  three  dimensions  in  ref.^‘*. 
In  the  present  approach  we  choose  the  scheme  that 
corresponds  with  the  value  of  D  which  is  identical  with  the 
value  for  the  first  order  minimum  cross  diffusion  scheme: 
D=min(a,b,c). 


4.2.4  3D  Multidimensional  Limiters 
In  this  section  the  2D  multidimensional  limiters  discussed 
before  are  extended  for  the  3D  upwind  schemes.  The  2nd 
order  zero  cross  diffusion  scheme  related  to  D=min(a,b,c) 
(fig.8)  is  rewritten  as  the  first  order  minimum  cross  diffusion 
scheme  plus  anti-diffusive  limited  correction  term. 


+  (Ao:x5uj  j_,/2,j.+Ap^§Uj 
with  Aa  =  a®  _  aO)  =  i  ( b  -  min(a,b) ) 

AP^  =  p®  -  p'J '  =  ^  ( c  -  min{a,c) ) 

Remark  that  only  one  limiter  is  applied  in  (24).  An  alternative 
possibility  would  be  to  add  a  different  limiter/ratio  to  each 
component  of  the  correction  term.  Near  discontinuities  the 
limiter  is  switched  off  (0=0)  and  the  1st  order  multi-D 
dissipation  is  applied.  In  smooth  regions  O  =1  and  then  the 
linear  2nd  order  scheme  is  applied. 

Notice  that  this  definition  of  L  is  not  the  same  as  for  the  linear 
multi-D  models  (20)  because  the  reference  dissipation  has 
been  changed  to  the  minimum  cross  diffusion  scheme  instead 
of  the  classical  1st  order  upwind  scheme.  The  definition  of  the 
3D  ratio  is  based  on  the  variations  5uy  and  5uz  in  the 
correction  term  of  (24)  and  some  extra  variation  in  the  third 
direction.  Thus  for  face  i-hl/2,j,k  a  variation  5ux  is  introduced 
in  the  3D  ratio, 

c,  Su,  +  c,  SUy  +  c,  5u^ 

''i+i/2.j,k  Aa^5Uy -H  Ap^5u^ 

Equation  (25)  has  the  same  form  as  the  definiton  of  a  3D  ratio 
in  the  formulation  of  a  new  fluctuation  splitting  scheme  in 
ref.^^  .  Different  possibilities  can  be  considered  for  6ux,  as 
shown  in  ref.  where  three  different  classes  of  3D  ratios  are 
defined.  Each  definition  corresponds  with  the  variations  in  a 
tetrahedron  constructed  by  the  three  variations  along  x-,y-  and 
z-axis.  For  more  details  concerning  monotonicity  conditions 
see  ret.  ‘ 

5.  EXTENSION  FOR  THE  EULER/NS  EQUATIONS 

The  conservative  form  of  the  3D  Navier-Stokes  equations  is 
written  as; 

^  +  Ir(P-Pv)  +  |(H-H  )  =  0  (26) 

with  conservative  variables  U  =(p,pu,pv,pw,pE)^’  the  inviscid 
fluxes  (F,G,H)  and  the  viscous  fluxes  (Fv.Gv.Hy).  The  latter 
are  appromimated  by  a  central  discretization  and  will  not  be 
considered  below.  Application  of  the  multidimensional 
upwind  dissipation  models  from  section  4  to  the  inviscid 
fluxes  consists  of  3  consecutive  steps: 

1 )  Characteristic  decomposition  of  the  Euler  system 

2 )  Multi-D  discretisation  of  the  characteristic  equations 

3 )  Re-transformation  to  conservative  numerical  flux. 


5,1.  Characteristic  Decomposition 

5.1.1  Characteristic  variables/  compatibility  equations 
The  2D  Euler  equations  are  expressed  by 

^  =  ^  +  A.^U=  0  (27) 

where  A=(A,B)  are  the  Jacobian  matrices.  The  eigenvalues  of 

the  matrix  K  =  A.k  associated  to  an  arbitrary  unit  propagation 

direction  i?  define  for  a  large  part  the  behaviour  of  the 
solutions  to  the  Euler  equations.  Wave-like  solutions  exist  if 
the  eigenvalues  of  K  are  real  and  the  corresponding 
eigenvectors  linear  independent^.  The  latter  define  a  similarity 
transformation  which  diagonalizes  matrix  K, 

P-l(X.iJ)P  =  A  (28) 

with  the  left  eigenvectors  being  the  rows  of  P"'’  the  right 
eigenvectors  being  the  columns  of  P  and  the  diagonal  matrix 
A  consisting  of  the  eigenvalues, 

v.iJ  ,  )i,®=  v.if-i- c  ,  X,W=v.i^-c  (29) 

Using  the  left  eigenvectors,  a  set  of  characteristic  variables 
can  be  constructed, 

5W  =  P-1  5U  or  8U  =  P5W  =  i  5w(k)  r^'"’  (30) 

k  =  1 


or  5w0)  =  5p  -  5p/c2 

5w®  =  l|.5v  -I-  |lI(8p  -  8p/c2) 

5w  D)  =  +  8p/pc 


5w  I'h  =  -iJ.,.8v  -I-  8p/pc 

with  p  being  a  free  parameter.  Eq.  (3 1 )  is  not  the  only  possible 
definition  of  characteristic  variables^,  but  the  above  choice  is 
well  appropriate  for  our  purpose  and  is  based  on  3  arbitrary 
propagation  directions, 

tfj  =  (Kj^.Kiy)  =  (cosOpSinO.)  ,  tj  =  (Kjy.-Kjjj)  for  i=  1,2,3  (32) 

In  order  to  identify  appropriate  wave  decompositions,  the 
characteristic  variables  are  defined  by  different  propagation 
directions;  w(l  ),w(2)  are  related  to  Kj  and  w(^)  and  w(^^  are 
related  to  respectively  K2,k,,  .  Multiplying  eq.  (27)  by  the 
matrix  P"'  and  introducing  the  characteristic  variables  (30> 
(3 1 )  leads  to  the  characteristic  compatibility  equations : 

^ -H  P-'AP^  -H  P-'BP^  =  0  (33) 


or  after  working  out  (33)  explicitly. 


3w('l 

“ar 

aw® 

“ar 

awG) 

“ar 

aw 


-1-  v.^w^'l 
-t  v.^w'2> 


-t-l,.Vp  =0 
p  ' 


(34) 


-I-  (\1 -tci^).^w(3)  -I-  ct2.(t2.^)'^  =0 
-I-  (v -ci^).^w(4)  -r  c  12.(13.^) V  =0 
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The  corresponding  4-wave  model  consists  of  one  entropy 
wave,  one  shear  wave  and  two  acoustic  waves^.  The  first  two 
terms  in  each  equation  of  (34)  represent  the  convection  of  the 
associated  wave  in  the  characteristic  direction, 

HO)  =  3®  =  ^,  SW  =  ^  +  ck;,  3W  =  ^-cr,  (35) 


obtained  in  a  laminar  boundary  layer  on  very  coarse 
meshes^®. 

•  Convection  of  entropy  and  enthalpy 

The  first  characteristic  direction  iCj  is  taken  perpendicular  to 
the  velocity 


The  subscript  c  in  (35)  refers  to  the  convective  part.  The  third 
terms  in  (34)  are  the  coupling  or  source  terms,  and  their 
presence  results  from  the  fact  that  the  Jacobian  matrices  in 
(27)  are  not  simultaneously  diagonalizable  by  matrix  P.  Notice 
that  the  coupling  terms  show  also  an  advective  behaviour 
associated  to  the  directions, 

3®:=T,,3f  =  l';,  3W  =  ]-;  (36) 


(40) 


Using  the  definition  of  specific  entropy  and  total  enthalpy, 


6S  =  — (^-6p),  5H: 
P 


^-r7(-^--^)  +  v.8v  (41) 
PY->  c2  y 


the  first  two  characteristic  equations  of  (34)  are  rewritten  as 


that  are  the  the  normal  directions  to  the  propagation  directions 
.  The  subscript  s  refers  to  the  source  terms. 

5.1.2  Propagation  directions 

The  ehoice  of  die  _propagation  directions  K|,K2,Kj  with 
related  normals  1^,12,13  in  (34)  are  still  to  be  defined.  A  main 
constraint  on  these  directions  comes  from  the  factor 
K|.(K2+k.,)  that  shows  up  in  the  denominator  of  P.  To  prevent 
an  ill-defined  transformation  this  factor  should  be  maximized. 
Other  conditions  to  impose  on  the  design  of  the  propagation 
directions  are  the  continuity  from  subsonic  to  supersonic  flow 
range  and  robustness.  Different  possibilities  of  propagation 
directions  have  been  examined. 

•  Diagonal ization  approach 

The  source  terms  in  the  system  of  compatibility  equations  (34) 
can  be  eliminated  by  the  following  set  of  propagation 

O 

directions®, 

T,.^p=0,  T2.  (12.^)  V  =  0  ,  13=12  (37) 

The  first  direction  is  taken  along  the  pressure  gradient 
while  are  taken  equal  and  defined  by  the  strain  rate 
tensor.  The  use  of  this  set  of  directions  depending  on  gradients 
of  the  solution,  shows  a  lack  of  robustness  in  Euler 
calculations^. 

•  Combination  pressure  gradient/velocity 

In  non-uniform  regions  are  taken  along  the  pressure 

gradient, 

T,.^p=0  ,  13=12=1,  138) 

In  smooth  regions  a  continuous  switch  between  the  pressure 
gradient  (38)  and  the  streamline  direction  is  introduced.  This 
model  shows  good  aceuracy  in  both  subsonie  and  supersonic 
regime^'^’ld.  A  better  convergence  behaviour  than  with  (37) 
is  obtained  especially  in  supersonic  flow.  In  some  cases  (e.g. 
subsonic  flows)  convergence  can  only  be  obtained  by  freezin® 
the  directions  after  a  certain  residual  drop  of  1  or  2  orders^’! 

•  Streamline  direction 

A  much  simpler  choice  is  taking  the  directions  along  the 
streamline 

T,.3  =  0  ,  l3=l2='i  139) 

This  choice  seems  to  have  good  convergence  behaviour  but 
has  a  poor  accuraey  especially  in  supersonic  inviscid  flows 
near  discontinuities^.  On  the  contrary  very  good  results  are 


9wl'l 

“ar 


0 


3w<^) 

Ot 


^  - 


^(hII 


+  -7^)'^sl  =  0 


(42) 


Eq.  (42)  shows  that  in  steady  state  the  entropy  and  total 
enthalpy  are  constant  along  the  streamline,  see  also 
ref  17,18,19  (where  (i=0).  The  2nd  equation  of  (42)  can 
further  be  simplified  by  choosing  the  parameter  |i.  as 
-c2/p||v||(Y-l)  leadingto 

^  =  0 
at  ycy 


(43) 


As  a  result  the  Euler  system  (34)  is  splitted  in  a  hyperbolic 
part  that  represents  the  convection  of  entropy  and  enthalpy 
along  the  streamline  in  steady  state  (43)  and  a  remaining 
acoustic  subsystem  with  source  terms,  as  in  the 

trio 

hyperbolic/elliptic  splitting  in  ref.*°>‘°. 

•  Machangle  splitting 

In  the  framework  of  the  fluctuation  splitting  schemes 
a  machangle  splitting  was  developed.  The  first  direction  is 
taken  perpendicular  to  the  velocity  and  the  2nd  and  3rd 
directions  are  respectively  perpendicular  to  the  positive  and 
negative  characteristics  or  machlines 

e,  =  0  +  |,  e2=e|+ix,  03  =  0,-11  (44) 

with  0  the  flow  angle  and  |i  =  arctan(VY^ M  2-  1 )  the 
machangle.  A  fully  decouple  system  of  characteristie 
equations  is  obtained  in  steady  state, 

v.^S  =  0  v.^H  =  0 

(45) 

(^-HciJ2)-^*^''’  =  °  ’  (v-ciJ3).vR4  =  0 

using  the  steady  Riemann  variables: 


5R-^ 

=  8w6^)  + 

c 

I  (T2.8v) 

(46a) 

II  -»  1 

V+CK^ 

8R4 

=  8w(^)  + 

c 

(T3.8V) 

(46b) 

V-CK3 
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•  Pseiiclo-Macliangle  splitting 

This  model  is  an  algebraic  continuation  of  the  supersonic 
machangic  decomposition  in  the  subsonic  range  with 
continuity  at  1VI=1,  developed  in  the  fluctuation  splitting 
approach  17,18.19.  y|.|g  corresponding  directions  are, 

0,  =  e,,  02=6  ±^±|,  03  =  6,  (47) 

with  p  =  arctan(Vy|lVI  7- 1 1)  defined  as  the  pseudo- 
machangle.  Notice  the  2  sets  of  propagation  directions  leading 
to  2  splittings  of  the  residual.  For  the  final  residual  the  average 
of  both  splittings  is  taken. 

Considerable  research  is  still  being  performed  to  identify  the 
most  suitable  directions,  see  for  instance'^-''’  for  a  recent 
survey. 

5.2.  Numerical  Flux  Formulation 

The  space  operators  in  the  characteristic  compatibility 
equations  (34),(43)  and  (45)  discussed  in  section  5.1  are 
discretised.  in  the  case  that  there  is  no  source  term,  the  space 
operator  is  expressed  by 

no  source  term :  2^''*.^w(k)  or 

where  the  gradient  acts  on  the  characteristic  variable  or  a 
steady  Riemann  invariant.  When  a  source  term  is  present,  the 
space  operator  can  be  written  as, 

with  source  term  ;  a^’^'.^wd^)  -i-  a^'^’.^sd^) 

where  the  convective  and  source  terms  are  written  as  an 
advection  of  respectively  a  characteristic  variable  and  a 
'source'  variable  along  the  associated  directions  (35)  and  (36). 
In  both  cases  the  source  and  convective  terms  have  the  same 
form  and  can  be  treated  by  the  same  mulli-D  scheme  or 
dissipation  model. 

In  the  formulation  used  in  previous  work'®’ '7,3 1, 33 
scalar  multi-D  dissipation  models  were  applied  only  to  the 
convective  terms  while  the  source  terms  were  discretised  by  a 
central  approximation  without  artificial  dissipation.  In  the 
present  approach  also  the  source  terms  can  be  treated  with  a 
multi-D  scheme  based  on  the  associated  speed  (36). 

Discretising  both  convective  and  source  terms  leads  to  two 
numerical  fluxes  or  dissipations  for  every  scalar  equation. 
Next  the  scalar  multi-D  discretisation  is  re-transformed  to  the 
conservative  residual  by  use  of  the  right  eigenvectors.  The 
resulting  invi.scid  numerical  flux  on  cell  face  i-rl/2,j  is  defined 
by 

=  +  +  (50) 

where  dg-  and  dj  represent  respectively  the  scalar  multi-D 
dissipation  of  the  convective  and  source  part  for  each  of  the  4 
characteristic  equations.  The  old  formulation  where  the 
convective  term  is  treated  by  a  multi-D  scheme  and  the  source 
term  by  a  central  scheme  without  artificial  dissipation  is  easily 
recovered  by  putting  the  dissipation  for  the  source  term  in  (50) 
to  zero. 


6.  RESULTS 

6.1.  2D  supersonic  Laval  nozzle 

The  inviscid  supersonic  flow  in  a  Laval  nozzle  is  calculated  at 
a  Machnumber  of  2.91  on  an  H-type  mesh  with  128x32  cells. 
The  first  order  minimum  cross  diffusion  scheme  (4IVICD)  and 
5-point  continuous  zero  cross  diffusion  scheme  (5ZCD) 
combined  with  minmod  limiter  and  the  ratio  of  subclass  (la)  is 
investigated.  The  multi-D  schemes  are  compared  with  the 
classical  2nd  order  Flux  Difference  Splitting  scheme  (FDS2) 
with  minmod  limiter.  The  classical  scheme  is  tested  on  a  finer 
mesh  of  256x64  cells,  as  reference  solution.  The  extension  to 
the  Euler  equations  is  based  on  the  2D  characteristic 
decomposition  (34)  with  the  3  characteristic  directions 
defined  by  the  machangle  splitting.  Both  the 
convective  and  source  terms  of  (34)  are  discretised  with  the 
same  scalar  multi-D  dissipation  model. 

Figure  9  shows  the  isomachlines  for  the  4  solutions.  The  first 
order  multi-D  scheme  performs  well  in  comparison  with  the 
classical  2nd  order  scheme  up  to  the  2nd  reflection  of  the 
shock  structure.  The  2nd  order  multi-D  scheme  is  superior  to 
the  classical  scheme  on  the  same  mesh.  It  compares  very  well 
with  the  reference  solution  on  the  finer  mesh.  Figure  10  and 
1 1  show  respectively  the  Machnumber  and  total  temperature 
distribution  along  the  symmetry-axis.  The  total  temperature  or 
total  enthalpy  should  be  constant  in  the  whole  field.  The  errors 
for  the  multi-D  schemes  are  much  smaller  than  for  the 
classical  results,  even  on  the  finer  mesh. 

Figure  12  shows  the  convergence  history.  Both  first  and 
second  order  2D  results  show  a  good  convergence  behaviour 
obtained  with  a  3  level  multigrid  acceleration  combined  with  a 
5-stage  Runge  Kutta  prodecure  and  residual  smoothing  with  a 
CFL  of  respectively  10.0  and  8.0. 

6.2.  3D  supersonic  corner  flow 

An  inviscid  supersonic  corner  flow'^  (M=3.0)  is  considered, 
which  is  generated  by  two  unswept  compression  ramps  with 
9.5  deg.  wedge  angle  as  illustrated  in  figure  13.  The  first  order 
3D  minimum  cross  diffusion  scheme  is  tested  in  comparison 
with  classical  first  and  second  order  (minmod  limiter)  upwind 
schemes  on  a  uniform  mesh  with  32x32x32  cells.  The 
accuracy  of  the  3D  scheme  is  investigated  for  both  3D  and  2D 
flow  phenomena  appearing  in  this  testcase.  The  extension  to 
the  Euler  equations  is  performed  using  the  3D  extension  of  the 
characteristic  variables  (31)  and  equations  (34),  see  ref. 
The  three  characteristic  directions  K|,K2,K3  are  taken  along 
the  pressure  gradient  direction.  When  the  pressure  gradient 
goes  to  zero  a  blending  is  performed  with  the  velocity 
direction. 

Figure  14  shows  the  convergence  history.  No  freezing  of  the 
directions  was  needed  to  reach  convergence  with  the  3D 
scheme.  Convergence  is  obtained  with  the  use  of  multigrid 
acceleration  and  residual  smoothing  with  a  5  stage  Runge 
Kutta  procedure  with  CFL  =10. 

Isomach  lines  are  shown  in  figure  15.  The  classical  first  order 
scheme  shows  smeared  out  shocks  and  no  contact 
discontinuities  while  the  first  order  3D  result  shows  an 
accuracy  comparable  with  classical  2nd  order.  From  the 
isomachlines  near  the  .solid  walls  one  can  conclude  that  the 
multi-d  result  shows  less  entropy  creation  than  the  classical 
schemes. 
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7.  CONCLUSIONS 

Genuinely  3D/2D  multidimensional  upwind  schemes  are 
developed  Tor  the  Euler/Navicr-Stokes  equations.  The 
schemes  are  formulated  in  the  I'ramework  ol  dimensional-split 
central  or  upwind  dissipation  models,  leading  to  a  new 
concept  of  compact  .3D/2D  multidimensional  upwind 
dissipation. 

A  unification  of  2D  compact  linear  schemes  is  shown  based 
on  two  classes  of  6-point  stencils.  Each  class  has  a  two- 
parameter  subclass  of  zero  eross  ditiusion  schemes  and  a 
unique  second  order  scheme.  Both  families  have  a  4-point 
subclass  in  common  that  has  a  unique  minimum  cross 
diffusion  .scheme  and  a  zero  cross  diffusion  scheme. 

A  class  of  -3D  .scalar  convection  schemes  based  on  an  8-point 
compact  stencil  is  derived  that  reduces  to  the  4-point  subclass 
in  2D.  It  has  a  unique  first  order  .scheme  with  minimum  cross 
diffusion  and  a  one-parameter  subclass  of  zero  cross  dillusion 
schemes  being  second  order  accurate  for  a  homogeneous 
convection  equation. 

Second  order  monotone  schemes  are  explored.  The  dissipation 
is  written  as  the  Ist  order  minimum  cross  diffusion  dissipation 
plus  additional  anti-diffusive  terms.  Three-  and  two- 
(.limensiontil  limiters,  depending  on  ratios  ol  Ilux  dillerences 
in  different  mesh  directions,  are  introduced.  In  2D  two 
families  of  ratios  related  to  two  types  of  triangles  are  defined. 
In  each  class  two  sub-families  are  considered  related  to 
variations  along  the  mesh  line  or  along  a  diagonal. 

Extensions  to  the  Euler-Navicr/Stokes  equations  arc  obtained 
through  a  characteristic  decomposition  using  characteristic 
variables  with  3  different  propagation  directions.  A  review  is 
given  of  different  choices  for  the  directions.  For  supersonic 
flow  the  combination  of  pressure  gradient  and  velocity  seems 
to  be  an  accurate  choice  but  the  machangle  splitting  is  more 
robust.  Application  of  the  3D  minimum  eross  diffusion 
scheme  in  combination  with  the  pressure  gradient  approach  to 
a  3D  supersonic  testcase  shows  comparable  accuracy  with  a 
classical  2nd  order  dimensional-split  upwind  scheme. 

For  subsonic  and  supersonic  flow  the  streamline  direction  is 
not  yet  accurate  enough  and  so  research  is  still  needed  to 
identify  more  effective  choices. 

A  new  formulation  is  introduced  where  both  the  convective 
and  source  terms  can  be  discrctised  with  a  multi-D  scheme 
using  its  associated  characteristic  speed.  Application  of  this 
new  approach  in  combination  with  the  machangle  splitting 
directions  to  a  2D  supersonic  testcase  shows  better  accuracy 
with  lower  total  temperature  error  than  classical  2nd  order 
dimensional-split  upwind  schemes. 
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Figures  Supersonic  Laval  nozzle  (M=2.91),  isomachlines,1.41<M<2.91,  AM=0.02. 


2nd  order  FDS,  128x32 
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Figure  11  Supersonic  Laval  nozzle  (M=2.91),  Total  temperature  distribution  along  symmetry-axis. 


(a)  FDS1 


(b)  8MCD 


(c)  FDS2-Minmod  limiter 


Figure  15  Supersonic  corner  flow,  isomachline  comparison  (  2.0<  M<  3.0,  AM=0.02). 
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SUMMARY 

A  second  order  accurate  (both  in  time  and 
space)  an  explicit/implicit  scheme  is  imple¬ 
mented  for  the  solution  of  three-dimensional 
incompressible  Navier-Stokes  equations  involv¬ 
ing  high  Reynolds  Number  flows  about  complex 
configurations.  A  fourth  order  accurate  artifi¬ 
cial  dissipation  term  on  the  momentum  equa¬ 
tions  are  used  for  stabilizing.  Finite  Element 
Method  (FEM)  with  an  explicit  time  marching 
scheme  is  used  for  the  solution,  and  element  by 
element  (E-B-E)  technique  is  employed  in  order 
to  ease  the  memory  requirements  needed  by  the 
storage  of  the  stiffness  matrix  of  FEM.  The  cu¬ 
bic  cavity  problem,  laminar  flow  past  a  sphere  at 
a  high  Reynolds  number  and  an  incompressible 
viscous  flow  around  the  fuselage  of  a  hebcopter 
are  succesfully  solved  using  the  first  and  the  sec¬ 
ond  order  accurate  schemes.  Comparison  of  the 
results  are  also  provided. 

1.  INTRODUCTION 

Recent  advances  in  iterative  solution  techniques 
enabled  CFD  researchers  to  solve  large  scale 
problems  in  acceptable  computation  times  with 
the  fast  processors  of  90’s.  The  iterative  solvers 
have  become  the  CFD’s  convenient  tools  which 
do  not  require  excessive  memories  on  comput¬ 
ers  for  either  implicit  time  marching  schemes 
or  inversion  of  elliptic  equations.  For  finite  el¬ 
ement  computations,  element  by  element  (E-B- 
E)  iteration  schemes  demand  the  least  amount  of 
memory.  The  conjugate  gradient  (CG)  method, 
which  is  the  Krylov  subspace  technique  applied 
on  symmetric  operators,  becomes  an  efficient, 
indeed  the  fastest  converging,  iterative  method 
when  applied  with  preconditioning  (PCG)  to  the 
discrete  form  of  the  equations. 

During  the  last  two  decades,  solution  of  three¬ 


dimensional  Navier-Stokes  equations  received 
considerable  attention.  However,  for  a  numer¬ 
ical  technique  to  fulfill  the  demands  of  90’s,  the 
accuracy  of  the  scheme  must  be  at  least  second 
order  for  both  in  time  and  space  discretizations. 

In  this  study,  a  second  order  accurate  (both 
in  time  and  space)  an  explicit/implicit  scheme 
is  implemented  for  the  solution  of  three- 
dimensional  incompressible  Navier-Stokes  equa¬ 
tions  involving  high  Reynolds  Number  flows 
about  complex  configurations.  To  do  so,  a 
fourth  order  accurate  artificial  dissipation  term 
on  the  momentum  equations  are  used  for  stabi¬ 
lizing.  Finite  Element  Method  (FEM)  with  an 
explicit  time  marching  scheme  is  used  for  the 
solution,  and  element  by  element  (E-B-E)  tech¬ 
nique  is  employed  in  order  to  ease  the  memory 
requirements  needed  by  the  storage  of  the  stiff¬ 
ness  matrix  of  FEM[1].  Since  the  scheme  is  time 
accurate,  the  transient  nature  of  the  flow  field  is 
properly  resolved. 

For  the  calibration  of  the  code  the  cubic  cavity 
problem  is  solved  using  the  first  and  the  second 
order  accurate  schemes.  The  comparison  with 
the  existing  literature[2]  is  satisfactory  even  for 
a  coarse  grid.  The  solution  with  the  fourth  order 
artificial  viscosity  adequately  resolves  the  recir¬ 
culating  region  as  opposed  to  the  solution  with 
the  second  order  artificial  viscosity. 

As  the  second  study,  laminar  flow  past  a  sphere 
at  a  high  Reynolds  number.  Re  =  162  000,  is 
solved  with  the  both  schemes.  Finally,  in  order 
to  test  the  capabilities  of  the  code,  an  incom¬ 
pressible  viscous  flow  of  Re=50  000,  around  the 
fuselage  of  a  helicopter  is  studied. 

AU  the  computations  are  performed  on  a  per¬ 
sonal  computer  equipped  with  a  i860  Number 
Smasher  board  with  32  Mbytes  of  memory. 
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2.  FORMULATION 

2.1  Governing  Equations 

The  Navier- Stokes  and  the  continuity  equations 
for  the  unsteady,  incompressible  flow  of  a  viscous 
fluid,  in  the  absence  of  body  forces  are: 

Z)V  1  _2-,t 

_  =  _Vp+_vW  (I) 

V-V  =  0  (2) 

The  equations  are  written  in  vector  form  (bold¬ 
face  type  symbols  denote  vector  or  matrix  quan¬ 
tities).  The  variables  are  non-dimensionalized 
using  a  reference  velocity  and  a  characteristic 
length,  as  usual.  Re  is  the  Reynolds  number, 
Re  =  Ul/v  where  U  is  the  reference  velocity, 
/  is  the  characteristic  length  and  v  is  the  kine¬ 
matic  viscosity  of  the  fluid.  V  corresponds  to 
the  Cartesian  velocity  components  u,v,  and  w. 
Pressure  is  symbolized  with  p  and  the  time  is 
with  t. 

For  a  weU  posed  problem,  the  governing  equa¬ 
tions  are  complemented  with  the  following  ini¬ 
tial  (t=0) 

V(x,0)  =  V°(x)  and  p(x,0)  =  p°(x)  (3) 

and  boundary  conditions  which  have  to  be  spec¬ 
ified  on  related  surfaces: 

V  =  G  and  —  pn  +  ^(VV)  ■  n  =  F  (4) 

Re 

where  x  is  the  position  vector,  G  and  F  are  pre¬ 
scribed  boundary  values,  and  n  is  the  unit  vector 
normal  to  the  boundary. 

2.2  Numerical  Methods 

The  governing  equations  are  integrated  in  time 
using  both  first  and  second  order  accurate 
schemes.  The  first  order  scheme  follows  that 
of  [3]  which  constitutes  a  time  marching  scheme 
based  on  Helmholtz  decomposition.  A  potential 
function  with  a  single  degree  of  freedom  at  each 
node  is  introduced  and  a  Poisson  equation  for 
the  potential  is  directly  discretized.  Eigth-node 
isoparametric  brick  elements  and  trilinear  inter¬ 
polation  functions  for  the  velocity  and  the  aux¬ 
iliary  potential  are  used.  The  pressure  is  defined 
at  the  centroid  of  each  element.  In  contrast  to 


the  potential  and  velocity,  pressure  values  are  in¬ 
terpolated  using  piecewise  constant  functions  at 
each  element.  Application  of  the  conventional 
Galerkin  integral[4]  to  the  equations  and  the 
boundary  conditions  gives  integral  finite  element 
formulations  for  one  brick  element [3, 5]. 

2.2.1  First  order  explicit  formulation 

Let  Vi  and  V2  denote  following  velocity  differ¬ 
ences  in  vector  form: 

Vi  =  -  V”" 

V2  = 

Using  a  forward  difference  operator  for  the  time 
derivative  in  equation  (1)  and  letting  V”"  and 
p™  be  solutions  at  the  known  time  level  m, 
the  first  order  explicit  fractional  step  algorithm, 
over  a  single  time  step  and  in  fully  discrete  ma- 


trix  form,  is  given  by  in  q 

:  direction 

as  follows: 

M-^a  _ 
At  1  " 

Ba  +  PeCa  — 

(A+d) 

iV, 

m 

(5) 

A  (i>='Ea 

''  Of 

(6) 

M  V“  = 

Ea  4> 

(7) 

,  4^e 

"  At 

(8) 

where  ^  is  the  auxiliary  potential  function,  M 
is  the  lumped  element  mass  matrix,  D  is  the 
advection  matrix,  A  is  the  stiffness  matrix,  C 
is  the  coeflicient  matrix  for  pressure,  B  is  the 
vector  due  to  boundary  conditions  and  E  is  the 
matrix  which  arises  due  to  incompressibility.  El¬ 
ement  potential  (j>e  is  defined  as 

(t>e  =  ■  tL-t  f  Ni(t)idne,  1  =  1,8  (9) 

no/(S2e) 

where  fl  is  the  flow  region  to  be  solved,  F  is  the 
boundary  of  and  Ni  are  the  shape  functions. 
Details  of  the  formulation  can  be  found  in  [5]. 

2.2.2  Second  order  explicit  formulation 

The  second  order  time  accurate  scheme  is  some¬ 
what  similar  to  that  of  [6]  wherein  a  new  inter¬ 
mediate  velocity  field  is  introduced.  Both  ex¬ 
plicit  and  implicit  versions  of  the  algorithm  are 
devised.  The  explicit  formulation  resembles  the 
first  order  explicit  scheme  except  that  the  frac¬ 
tional  step  velocities  are  calculated  in  two  steps. 
Let  Vi,  V2  and  V3  denote  following  velocity 


vector  differences; 

Vj  =  V"'‘‘*‘2  -  V™ 

V2  V’'  -  V™ 

V3  =  -  V* 

the  second  order  explicit  fractional  step  algo¬ 
rithm,  in  fuUy  discretised  matrix  form,  over  a 
single  time  step  is  defined  by: 


2M\/a  _ 
At  '^1  ~ 


Ba  +  PcCq 


(A  +  d)  V, 


(10) 


M-tra 
At  2 


PfC„  + 


+  D  V, 


iA<h  = 
MVf  = 


(11) 

(12) 

(13) 


^m+1 


0e 

At 
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The  factor  ^  appearing  in  (12)  and  (13)  is  used 
for  second  order  accuracy  in  time.  In  the  formu¬ 
lation  given  above,  V*  is  a  velocity  vector  which 
is  not  selonoidal. 


2.2.3  Second  order  impbcit  formulation 

The  implicit  fractional  step  formulation  follows 
the  same  steps  as  does  the  explicit  one.  How¬ 
ever,  the  formulation  is  obtained  by  adopting  a 
Crank-Nicolson  representation  for  the  diffusion 
terms,  but  otherwise  retaining  the  expbcit  for¬ 
mulation  as  before. 

Using  the  same  velocity  diffrence  formulas  de¬ 
fined  for  the  second  order  explicit  formulation 
above,  the  second  order  implicit  Galerkin  frac¬ 
tional  step  algorithm,  in  fuUy  discretised  matrix 
form,  over  a  single  time  step  is  defined  by: 

(“  +  A)  Vf  =  (14) 

[b„  +  P.C„-(^+D)vJ” 


(M  +  5e)v?=  (15) 

(b„  +  P,C,  -  ^v„)”  -  (D 


(16) 
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For  the  implicit  solutions  of  equations  (14)-(17). 
Element  By  Element  (E-B-E)  techmque[l]  is  em¬ 
ployed  in  order  to  ease  the  memory  requirements 
needed  by  the  storage  of  the  stiffness  matrix 
of  FEM.  The  iterative  solution  is  fully  vector¬ 
ized  [7].  The  right  hand  side  values  of  these  equa¬ 
tions  are  scaled  with  the  square  of  the  time  step 
to  increase  accuracy.  These  scaling  is  found  to 
reduce  the  number  of  iterations  by  almost  50%. 


2.3  Artificial  Dissipation 

In  the  present  study,  a  fourth  order  accurate  ar¬ 
tificial  dissipation  term  on  the  momentum  equa¬ 
tions  are  used  for  stabilizing.  The  diffusion  term 
is  added  explicitly  to  the  right  hand  side  of  equa¬ 
tion  (1).  Formulation  given  in  reference[8]  is  ex¬ 
tended  to  three- dimensions.  The  artificial  vis¬ 
cosity  term  is  computed  in  two  steps  at  element 
level.  First  a  second-order  differencing  is  accom- 
pUshed: 

8 

i=i 

These  values  give  the  second-order  distributions 
to  cell  corners  (i)  for  the  momentum  equations. 
Then,  fourth  order  distributions  to  cell  corners 
are  formed  using  the  above  values: 


j=i 

These  fourth  order  viscosity  terms  are  multi¬ 
plied  by  a  certain  coefficient  when  added  to  the 
momentum  equations.  All  the  velocity  com¬ 
ponents  are  multiplied  by  the  same  coefficient, 
c  <  1/24.  No  dissipation  term  is  added  to  the 
poisson  equation  for  the  potential. 


3.  RESULTS  AND  DISCUSSION 

For  the  calibration  of  the  code  the  cubic  cavity 
problem  is  solved  using  the  first  and  the  second 
order  accurate  schemes.  The  grid  used  is  fairly 
coarse,  11x11x11.  The  first  order  scheme,  with 


iA<^  =  E,  V* 
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second  order  artificial  dissipation,  gives  low  ve¬ 
locity  gradients  in  the  vicinity  of  the  walls  as 
seen  in  Fig.l  a  and  b.  The  second  order  accurate 
scheme,  on  the  other  hand,  predicts  the  velocity 
profiles,  even  with  a  coarse  grid,  in  agreement 
with  the  results  given  with  spectral  methods[2]. 


Shown  in  Fig.l  c  and  d  is  the  symmetry  plane  ve¬ 
locity  vectors  obtained  with  the  first  and  second 
order  schemes  respectively.  The  flow  Reynolds 
number  is  1000  and  the  dimensionless  time  level 
is  30,  the  steady  state  is  practically  reached. 


0.80  0.25  0.50  0.75  1.00  0.00  0-25  0.50  0.75  1.00 

c)  Solution  with  second  order  dissipation,  d)  Solution  with  fourth  order  dissipation. 

Fig.l  Cubic  cavity  velocity  profiles  for  Re=1000  in  comparison  with  the  results  of  Ref[2]  on  the  symmetry 
plane  at  steady  state  (a-b).  Present  solutions  with  fourth  and  second  order  artificial  dissipations  are 
shown.  Flow  velocity  vectors  on  the  symmetry  plane(c-d).  11x11x11  stretched  grid. 
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a)  Full  grid. 

Fig. 2  Numerical  grid  on  the 

As  the  second  study,  laminar  flow  past  a  sphere 
at  a  high  Reynolds  number,  Re  =  162  000,  is 
solved  with  both  the  first  and  the  second  order 
schemes.  The  full  grid  around  the  sphere  con¬ 
sists  of  19127  points  and  17640  brick  elements 
as  shown  in  Fig. 2a.  Fig.2b  shows  the  details  of 
the  grid  around  the  body.  In  Fig. 3  a  and  b  the 
velocity  vectors  at  the  symmetry  plane  at  about 
time  level  of  4  is  plotted.  The  length  of  the  sepa¬ 
ration  bubble  predicted  with  the  both  approach 
is  almost  the  same,  however,  the  width  differs 
significantly.  The  second  order  accurate  scheme 
predicts  the  separation  angle  close  to  the  value 
given  in  [9].  As  seen  in  Fig. 3  a  and  b,  the  flow 
is  symmetric  with  respect  to  the  mid  plane  and, 
at  the  upper  half  of  the  plane  there  is  a  ma¬ 
jor  clockwise  recirculating  bubble.  The  details 
in  the  separation  region,  however,  is  predicted 
with  the  accurate  scheme  as  seen  in  Fig.  3.b, 
wherein  a  smaller  bubble  with  clockwise  rotation 
is  present  at  the  upstream  of  the  major  one.  The 
more  detailed  picture  of  right  after  the  shoulder 
is  given  in  Fig. 4  b,  where  there  is  a  very  small 
counterclockwise  rotating  bubble  in  between  the 
major  and  the  minor  clockwise  rotating  bubbles. 

AU  these  details  are  smeared  out  with  the  first 
order  method  as  seen  in  Fig. 4  a.  even  with  finer 
resolution  in  radial  direction. 

The  third  problem  solved  is  related  to  an  insti¬ 
tutional  project  for  developing  a  generic  helicop- 


b)  Near  body  detail, 
symmetry  plane  of  the  sphere. 

ter.  The  grid  around  the  fuselage  is  shown  in 
Fig. 5,  where  11280  brick  elements  with  12915 
nodes  are  used  to  resolve  the  symmetric  half  of 
the  flow  field.  The  flow  Reynolds  number,  based 
on  the  height  of  the  body  taken  as  a  characteris¬ 
tic  length  is  50  000.  Shown  in  Fig. 6  a  and  b  is  the 
symmetry  plane  velocity  vector  fields  at  about 
the  steady  state  obtained  with  the  first  and  sec¬ 
ond  order  schemes,  respectively.  The  results  of 
the  second  order  scheme  indicate  a  longer  sep¬ 
aration  region  in  the  wake.  Fig. 6  b.  A  detailed 
picture  of  the  wake  is  depicted  in  Fig. 7  a  and  b, 
wherin  the  separation  bubble  obtained  with  sec¬ 
ond  order  scheme  is  twice  longer  than  the  buble 
obtained  with  the  first  order  scheme.  Also  seen 
in  Fig. 6  b  is  a  small  separation  region  at  the  bot¬ 
tom  of  the  fusalage  where  there  is  an  unfavor¬ 
able  pressure  gradient.  The  first  order  scheme 
can  not  predict  that  separation  region  because 
of  high  artificial  diffusion.  The  detailed  picture 
of  this  unfavorable  pressure  region  is  provided 
in  Fig. 8  a  and  b  for  the  first  and  second  order 
schemes,  respectively. 

The  presssure  distribution  on  the  body  surface 
at  the  symmetry  plane  is  given  in  Fig. 9.  Ac¬ 
cording  to  this  figure  the  pressure  values  foUow. 
in  general,  the  same  trend  for  the  both  solu¬ 
tions,  however  the  unfavorable  pressure  gradient 
region  at  the  bottom  surface  indicate  where  the 
two  solutions  do  not  agree. 
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a)  Second  order  dissipation 


b)Fourth  order  dissipation 


a)  Solution  with  second  order  dissipation. 


b)  Solution  with  fourth  order  dissipation. 

Fig. 6  Velocity  vectors  on  the  symmetry  plane  of  the  fuselage,  Re=50  000. 


Shown  in  Table  1  is  the  drag  coefficient  values 
for  the  sphere  compared  with  the  experimen¬ 
tal  data[9].  As  seen  from  the  values,  the  first 
order  scheme  over  estimates  the  coefficient  val¬ 
ues  whereas  the  second  order  scheme  under  esti¬ 
mates  them  as  compared  to  experimental  values. 


The  drag  coefficient  values  evaluated  for  the 
helicopter  fuselage  with  both  schemes  are  also 
given  in  Table  1. 

CONCLUSION 

A  computer  code  based  on  a  second  order  accu- 
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a)  Solution  with  second  order  dissipation. 


b)  Solution  with  fourth  order  dissipation. 


Fig. 7  Velocity  vector  details  at  the  wake  region  of  the  fuselage,  Re=50  000. 


a)  Solution  with  second  order  dissipation, 


b)  Solution  with  fourth  order  dissipation. 


Fig. 8  Velocity  vector  details  at  the  unfavorable  pressure  region  of  the  fuselage. 


Fig. 9  Pressure  coefficient  (Cp)  values  on  the  body  surface  at  the  symmetry  plane. 
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Geometry 

Re  number 

Scheme 

Cd 

Sphere 

162  000 

Experiment  [9] 

0.47 

First  order 

0.52 

Second  order 

0.38 

Fuselage 

50  000 

First  order 

0.20 

Second  order 

0.11 

Table.  1  Drag  coefficient  values  for  the  sphere  and  the  helicopter  fuselage. 


rate  scheme  is  developed  and  implemented  for 
flows  involving  large  separations  and  strong  re¬ 
circulations  about  arbitrary  shapes. 

The  results  obtained  for  various  test  case  are  in 
good  agreement  with  the  existing  numerical  and 
experimental  data. 

The  code  is  implemented  satisfactorily  to  pre¬ 
dict  the  drag  coefficient  of  a  generic  helicopter 
fuselage. 
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SUMMARY 

Chorin's  method  of  artificial  compressibility  is  extended  to 
both  compressible  and  incompressible  fluids  by  using 
physical  arguments  to  defme  artiHcial  fluid  properties  that 
make  up  a  local  preconditioning  matrix.  In  particular, 
perturbation  expansions  are  used  to  provide  appropriate 
temporal  derivatives  for  the  equations  of  motion  at  both 
low  speeds  and  low  Reynolds  numbers.  These  limiting 
forms  are  then  combing  into  a  single  function  that 
smoothly  merges  into  the  physical  time  derivatives  at  high 
speeds  so  that  the  equations  are  left  unchanged  at  transonic, 
high  Reynolds  number  conditions.  The  effectiveness  of 
the  resulting  preconditioning  procedure  for  the  Navier- 
Stokes  equations  is  demonstrated  for  wide  speed  and 
Reynolds  number  ranges  by  means  of  stability  results  and 
computational  solutions.  Nevertheless,  the  preconditioned 
equations  sometimes  fail  to  provide  a  solution  for 
applications  for  which  the  non-preconditioned  equations 
converge.  Often  this  is  because  the  reduced  dissipation  in 
the  preconditioned  equations  results  in  an  unsteady 
solution  while  the  more  dissipative  non-preconditioned 
equations  result  in  a  steady  state.  Problems  of  this  type 
represent  a  computational  challenge:  it  is  important  to 
distinguish  between  non-convergence  of  algorithms,  and 
the  non-existence  of  steady  state  solutions. 

1  INTRODUCTION 

Time-marching  techniques  have  proven  to  be  very  effective 
for  the  computation  of  high  Reynolds  number  flows  in  the 
transonic,  supersonic  and  hypersonic  regimes.  These 
methods,  however,  become  inefOcient  at  low  speed  or  low 
Reynolds  number  conditions  including  the  near  wall 
regions  of  high  Reynolds  number  flows.  For  this  reason, 
incompressible  and  low  speed  computations  were 
dominated  by  pressure-based  procedures  [1]  for  many  years. 
Chorin's  pseudo-compressibility  method  [2],  which  has 
become  widely  accepted  for  incompressible  flows  [3], 
opened  one  avenue  for  applying  time-marching  procedures 
to  incompressible  flows  but  there  was  little  realization  that 
this  procedure  could  be  broadened  to  enable  computations 
at  all  speeds  tmtil  recently. 

Extensions  of  time-marching  methods  to  low  Mach  number 
compressible  flows  became  possible  with  the  realization 
that  it  was  the  stiffness  of  the  eigenvalues  that  slowed 
convergence  at  low  speeds.  Low  Mach  number  perturbation 
procedures  were  first  used  to  remove  these  problems  [4]  and 
were  used  in  pressure-based  methods  to  compute  low  spjeed 
compressible  solutions.  The  implementation  of  time¬ 
marching  methods  to  the  low  Mach  number  perturbation 
equations  were  first  reported  by  Gustafsson  [5],  followed  by 
extensive  apiplications  by  the  present  authors  [6]. 
Perturbation  expansion  methods  have  also  been  extended 
to  combustion  problems  [7].  Of  these  perturbation 
expansion  methods,  some  (6,  7]  used  the  more 
conventional  expansion  procedures  based  on  the  square  of 


the  Mach  number,  while  others  [5,8]  expanded  the 
equations  in  terms  of  the  first  pewer  of  the  Mach  number. 

In  parallel  with  these  pierturbation  procedures,  local 
preconditioning  methods  in  which  the  time  derivatives  of 
the  equations  of  motion  are  multiplied  by  a  matrix  to 
control  the  eigenvalues  have  also  been  used  to  enhance 
convergence  [8-16].  Unlike  the  pierturbation  equations, 
these  preconditioned  equations  are  valid  at  all  spieeds,  and 
so  have  a  pxitential  for  providing  uniform  convergence 
over  all  Reynolds  and  Mach  number  regimes.  Two  distinct 
philosophies  have  been  followed  in  developing  these 
preconditioning  methods.  One  uses  the  pierturbation 
procedures  described  above  and  deals  with  the  full  Navier- 
Stokes  equations  and  includes  the  Euler  equations  as  a 
spiecial  case  [11-14].  The  intent  of  this  apiproach  is  to 
improve  convergence  at  low  spieeds  and  Reynolds  numbers 
only,  while  leaving  it  unaltered  at  high  Reynolds  numbers 
and  high  spieeds  (transonic  and  above)  where  it  is  already 
quite  efficient.  This  method  has  been  applied  extensively 
to  a  wide  variety  of  practical  applications. 

The  second  approach  [15,  16]  provides  a  rigorous  method 
for  developing  a  preconditioning  matrix  for  the  Euler 
equations,  but  equally  rigorous  extension  to  the  Navier- 
Stokes  equations  appiears  doubtful.  This  preconditioning 
procedire  is  intended  to  provide  optimum  convergence  over 
the  entire  Mach  number  regime,  but  limited  applications 
have  thus  far  demonstrated  convergence  enhancement  only 
in  the  low  Mach  number  regime  [16].  Even  there,  this 
second  method  is  generally  less  effective  than  that 
provided  by  the  pierturbation-expansion-based  methods. 
Further,  the  convergence  enhancement  to  be  had  at 
transonic  and  supiersonic  spieeds  is  very  limited  because 
time-marching  methods  are  already  efficient  there  so  that 
substantial  improvements  are  unlikely. 

The  purpiose  of  the  present  papier  is  to  demonstrate  how  a 
viscous  preconditioning  procedure  can  be  developied  from 
the  basic  physics  of  the  flow  using  low  spieed  and  low 
Reynolds  number  pierturbation  expansions.  As  a  part  of 
this  development,  the  link  between  our  compressible 
preconditioning  method  and  the  artificial  compressibility 
method  of  Chorin  is  shown.  Following  some 
representative  examples  of  convergence  enhancement  for  a 
wide  variety  of  problems,  the  papier  closes  by  addressing 
the  issue  of  the  robustness  of  preconditioning  methods. 
One  spiecific  example  is  given  in  which  the  preconditioned 
methods  fail  to  provide  convergence  to  a  steady  state. 
Detailed  investigation  shows  that  the  physical  problem  is 
unsteady  and  a  steady  solution  fails  to  exist.  The  reduced 
artificial  dissipation  in  the  preconditioned  solution  makes 
this  unsteadiness  more  apparent.  The  prospiect  of 
distinguishing  non-convergence  fi’om  the  non-existence  of 
steady  state  solutions  is  thus  raised  as  a  challenge  facing 
CFD  techniques. 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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2  THE  EQUATIONS  OF  MOTION 

The  equations  of  motion  can  be  written  in  conservative 
form  as; 


dt  dx  dy  dz 


=  L„(Q„) 


(1) 


where  the  viscous  terms  are  given  by  the  operator  Ly,  and 
the  vectors  Q,  Qy,  and  E  are 
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(2) 


with  an  analogous  definition  for  F  and  G.  Here,  p 
represents  the  density,  p  is  the  pressure,  and  u,  v  and  w  are 
the  Cartesian  velocity  components  in  the  x,  y  and  z 
directions  respectively.  The  total  energy,  e,  is  the  sum  of 
the  internal  energy,  £,  and  the  kinetic  energy. 
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(7) 

where  Pp,  py^andliT’  are  partial  derivatives.  For  a  perfect 
gas  Pt  =-p/T;  Pp  =  l/RT,  kr=  TR/(y-1)  where  y  is 
the  ratio  of  specific  heats.  Note  that  hp  is  the  specific  heat 
at  constant  pressure. 


Other  matrices  of  interest  include  the  Jacobian,  Ay  = 
dEldQy, 
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e 


pe  +  ^  (“ 


+v^  +w^\ 


(3) 


with  analogous  expressions  for  =  3F  /  and 

C„  =  dGldQ„. 


The  enthalpy,  h,  is  related  to  the  internal  energy  and  the 
pressure, 

pA  =  pe  +  p  (4) 

and  for  a  pwrfect  gas  can  be  expressed  as  a  function  of  the 
temperature  alone,  h  =  h(T).  The  stagnation  enthalpy  is 
defined  as  =  A  +  («^  +  +  w^)/2  .  The  formulation  is 

completed  by  the  perfect  gas  equation  of  state  which  we 
write  as, 

p  =  p{p,T)  =  p/RT  (5) 

to  emphasize  that  the  density  depends  on  the  temperature 
and  pressure.  This  form  makes  it  possible  to  include 
incompressible  fluids  and  perfect  gases  in  a  single 
procedure. 

The  "viscous”  vector,  Qy,  that  spears  in  Eqs.  1  and  2 
represents  the  dependent  variables  that  appear  naturally  in 
the  diffusion  terms.  Because  the  first  cell  of  this  variable 
(corresponding  to  the  continuity  equation)  is  null,  we 
choose  to  fill  it  with  the  pressure,  p.  This  choice  makes  Qy 
a  unique  function  of  the  conservative  variable  Q  .  For 
convenience,  we  use  this  set  of  primitive  variables  as  our 
primary  dependent  variable  set,  but  we  retain  the 
conservative  fluxes. 


3  LOW  MACH  NUMBER  SCALING 

The  eigenvalues  of  (6)  determine  the  convergence  rate  of 
the  time-marching  algorithm.  These  eigenvalues  are 
obtained  from  the  roots  of  the  fifth  order  polynomial: 


which  are  readily  found  to  be  «,«,«,«  ±  c  where  the  acoustic 
speed,  c  ,  is  given  by, 

c2  ^  PV -  (10) 

Pr+  PPp^ 

The  speed  of  sound  reduces  to  the  familiar  relation,  c‘^  = 
yRT,  for  a  perfect  gas,  while,  for  an  incompressible  fluid 
where  Pp  =  pr  =  the  speed  of  sound  becomes 

infinite  causing  the  time  derivatives  in  the  continuity 
equation  to  vanish  so  that  continuity  reduces  to  V  •  V  =  0 . 

To  ensure  uniform,  efficient  convergence  over  all  speed 
ranges,  we  replace  the  matrix  (dQlSQy)  in  (9)  by  a 
preconditioning  matrix,  r„ ,  and  consider  the  solution  of 
the  modified  equation. 


The  variables  in  the  time  derivative  can  easily  be 
changed  from  QtoQyhy  means  of  the  chain  rule. 


(q„) 

dt  dx  3y  dz 


(11) 


dE  dF  dG  We  define  in  a  form  analogous  to  3j2/3Gv.  by  ^placing 

1  I  ^  ^  ^  ^  ^  IT  ^  ^  properties,  Pp ,  py  and  fip,  by  the  artificial 

quantities,  Pp  ,  pr  and  Ay  respectively.  These  quantities 
where  3G/3Gv  represents  the  Jacobian,  represent  a  three-parameter  preconditioning  system  whose 

vdues  can  be  chosen  to  ensure  well-conditioned 
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eigenvalues  at  all  speeds,  thereby  ensuring  fast,  efficient 
convergence.  The  deHnition  of  Ae  three  parameters  in  the 
preconditioning  matrix,  r„ ,  will  be  obtained  from 
perturbation  analyses  of  the  equations  of  motion  at  low 
speeds  and  low  Reynolds  numbers.  Their  presence 
introduces  an  artifrcial  speed  of  sound,  c^  that  ensures  that 
eigenvalue  stiffness  is  avoided.  Additional  restrictions  on 
the  preconditioning  matrix  have  been  given  by  Viviand  [9] 
and  Choi  and  Merkle  [12]. 

To  overcome  the  difficulties  at  low  speeds,  we  expand  Qy 
the  power  series, 

Qu  -  Quo  +  cQul  +  •••  (12) 

where  E=Af2.  Upon  substituting  this  expression  into  (11), 
we  obtain  to  order  l/E,po  =  constant,  which  says  that  the 
thermodynamic  pressure  is  externally  imposed.  Scaling 
the  temporal  and  spatial  derivatives  of  pressure  to  order 
unity  then  causes  Ae  term,  pi,  to  appear  in  the  zeroeth 
order  equations.  To  reflect  this,  we  define  the  vector,  Q'^, 

Qio  =  {/>1.  “o.  ''o.  “"o.  2o)  (13) 

so  that  the  equation  system  that  is  valid  to  order  unity 
becomes  (for  simplicity  here,  we  write  only  the  one¬ 
dimensional  equations). 

=  i  „(«;»)  (14) 

The  corresponding  matrices  r„o  8^**1  ^vo  given  by 
evaluating  r„  and  Ay  with  the  values  QyQ. 

Requiring  that  the  temporal  paressure  derivative  be  retained 
in  the  continuity  equation  in  the  low  speed  limit  implies 
that  Pp  must  be  of  order  one,  or  that  pp ,  is  given  by 

pp  =  (15) 

where  kp  is  a  constant  of  order  unity  and  Vf  is  an 
appropriate  reference  velocity. 

To  ensure  that  the  energy  equation  is  uncoupled  from  the 
continuity  and  momentum  equations  in  the  incompressible 
limit,  we  make  the  variable  pj-  proportional  to  Pt-  , 


The  quantity  Af  is,  as  yet,  free. 

With  the  special  values  for  p'p  and  pp  given  in  (15)  and 

(16),  two  eigenvalues  of  become  equal  to  the 

particle  speed  The  third  eigenvalue  also  equals  «.  if  Aj  = 
or  if  the  physical  properties  p^  and  pj*  are  zero  as  in 
incompressible  flow.  For  these  conditions,  the  full  set  of 
eigenvalues  of  F^^Ay  is: 

Ap  =  rfia^|u,u,u,  u|a+Vfc  ),«(<* -Vfe)!  (18) 

where  the  quantities  a  and  b  are  given  by, 

“  (^)[^  (pp^  +  PpAy ) + Pr  +  Pr  ] 

(19a) 

+(P7’ “  Pr  )^  +  2p|[pr  (ppAf  -  Pp ) 

“2pp(p7'Af -prAr)]-!-  (2pp(pr^-pr^) 

-Pr(Pp^-Pp^))|j 

Inspection  of  the  generalized  acoustic  eigenvalues  in  (19) 
shows  that  the  physical  properties,  pp  and  py  that  cause 
the  sp)eed  of  sound  in  incompressible  flows  to  be  infinite 
no  longer  appear  in  the  denominator:  only  the  artificial 
properties  pp  and  px  do.  Replacing  these  physical 

properties  by  properly  defined  artificial  properties 
alleviates  the  decoupling  between  the  pressure  and 
momentum  terms  in  incompressible  flows,  and  makes  time¬ 
marching  practical  for  both  incompressible  and  low  speed 
compressible  flow  computations.  For  incompressible 
flows,  this  approach  leads  to  the  artificial-compressibility 
method  of  Chorin  as  is  shown  below. 

Replacing  Pp  by  Ap/V^  and  p'x  by  k^Px  as  suggested 
by  the  low-Mach  number  scaling  provides  eigenvalues  that 
are  well  -conditioned  for  low  speeds.  For  a  perfect  gas,  the 
coefficients,  a  and  b,  become: 


p’x  —  kxPx  (16) 

where  kj’  is  a  quantity  whose  value  is  less  than  or  equal  to 
one.  This  replacement  causes  px  to  vanish  when  Px  goes 
to  zero.  Specific  values  for  kj  are  0  or  1. 

Placing  Eqs.  15  and  16  in  F„,  gives  well  conditioned 
eigenvalues  in  the  limit  of  low  Mach  numbers. 
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where  Mf  =  Vflc  is  the  reference  Mach  number.  The 
behavior  of  this  eigenvalue  is  difficult  to  determine  from 
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the  algebraic  form,  but  it  b  seen  that  as  the  Mach  number 
goes  to  zero,  the  eigenvalues  approach  a  constant  times  the 
particle  speed.  Numerical  checks  verify  that  this  constant 
is  of  order  unity  and  the  eigenvalues  are  well-conditioned. 
Stability  results  for  this  condition  are  presented  later. 

For  incompressible  flow,  the  coefficients  a  and  b  become: 

(1  =  1/2,  b=[vf  /  kpu^ +  1/ 4)  (21) 

which  are  clearly  well-behaved.  Choosing  ^  =  2  gives 
eigenvalues  whose  ratio  is  no  worse  than  2.  This  choice  is 
identical  to  the  artiffcial  compressibility  method  of  Chorin 
[2].  We  also  note  that  for  incompressible  flow,  it  is  not 
necessary  to  set  Ar  =  hf  to  obtain  simple  algebraic 
eigenvalues,  and  the  third  "particle"  eigenvalue  becomes  X 
=  ubjIhSp,  so  the  parameter  can  be  selected  to  control 
convergence  in  the  energy  equation  if  desired. 


scaled  for  low  Reynolds  numbers  to  see  how  our  three 
parameters  pp ,  and  hip  must  behave  in  the  diffusion- 
dominated  limit. 

For  low  Reynolds  numbers,  we  scale  the  momentum 
equations  such  that  the  temporal  derivatives  and  the 
pressure  gradient  remain  of  the  same  order  as  the  viscous 
terms  as  the  Reynolds  number  goes  to  zero.  This  defines 
the  proper  scaling  for  the  pressure  (pr  =  Pr  and  the 
time  imposes  no  conditions  on  any 

of  our  three  preconditioning  parameters. 

Using  this  reference  pressure  and  time  and  requiring  that  the 
temporal  term  in  the  continuity  equation  balance  the 
convective  terms  at  low  Reynolds  numbers,  results  in  the 
condition  on  pp, 

(26) 


4  LOW  REYNOLDS  NUMBER  EQUATIONS 

Having  obtained  some  understanding  of  the  way  the  Euler 
equations  scale,  we  now  turn  to  the  Navier-Stokes  equations 
and  consider  their  proper  scaling  in  the  limit  of  low 
Reynolds  numbers.  Here,  we  use  a  similar  perturbation 
expansion,  but  we  let  the  small  parameter,  e,  be  the 
Reynolds  number.  Re.  We  begin  by  premultiplying  (11) 

by  the  matrix 
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This  multiplication  gives  the  matrix 
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(23) 


and  results  in  the  convective  terms. 


( dpu  du  3p  dw  dh  3p  V 


(24) 


for  the  x-direction.  Multiplication  by  PJ"^  does  not  affect 
the  viscous  terms  in  the  momentum  equations,  but  the 
corresponding  terms  in  the  energy  equation  reduce  to  the 
conduction  term  plus  the  viscous  dissipation.  The  modified 
energy  equation  becomes: 


To  prevent  the  temporal  derivative  of  the  temperature  from 
appearing  in  the  momentum  equation,  we  also  require, 

p'p  =  kpPpRe  /  Tf.  (27) 

In  these  expressions,  kp  is  a  constant  of  order  unity,  while 
kp  is  less  than  or  equal  to  one. 

Scaling  the  energy  equation  in  a  manner  consistent  with 
these  definitions  results  in  the  requirement, 

h!p  =  k'^Cprl  Pr  (28) 

where  Cpf  is  a  reference  specific  heat,  Pr  is  a  reference 
Prandtl  number,  and  kji  is  another  constant  of  order  one. 

The  resulting  low  Reynolds  number  equations  in  one- 
dimension  then  become: 


, ,  dp  du  dp  rr 


0 


du  dp  _  d  4  ^ 
^  dt  dx  3  ^  34c 


(29) 


kL 


—  =  WkVT  +  ^ 
dt 


Note  that  in  the  energy  equation,  we  have  assumed  that  the 
quantity  Vf  /  Cp^T^  is  small.  Retaining  it  adds  a  pressure 

gradient  term  to  the  energy  equation,  but  does  not  affect  the 
requirements  placed  upon  our  three  parameters.  Equations 
(30)  are  the  creeping  flow  equations. 


5  SUMMARY  OF  PRECONDITIONING 
PROCEDURE 


(25) 

where  4>  is  the  viscous  dissipation,  and  we  have  omitted 
convective  terms  in  y  and  z.  These  equations  can  then  be 


The  correct  asymptotic  form  of  the  three  parameters,  p'p , 
Pp  and  hp ,  as  determined  from  the  low  Mach  number 
scaling  and  low  Reynolds  number  scaling  is  summarized  in 
Table  I.  Use  of  these  values  ensures  that  the  equations  are 
properly  scaled  in  these  two  limits.  To  use  these 
parameters  for  computations  at  other  than  the  limiting 


20-5 


transitions  smoothly  between  these  limits  while  also 
approaching  the  non-preconditioned  equations  for  high 
Reynolds  number,  transonic  flows.  This  functional  form 
will  be  developed  by  combining  these  limiting  values  into 
a  single  continuous  function  and  then  verifying  the  results 
first  by  means  of  stability  theory,  then  by  simplified 
computational  problems,  and  finally  by  practical 
applications  at  low  speed,  low  Reynolds  number  and 
transonic  conditions. 

Preconditioning  the  Euler  equations  is  relatively  easy,  but 
preconditioning  the  Navier-Stokes  equations  is  more 
difficult  for  several  reasons.  First  of  all,  the  appropriate 
Reynolds  number  must  be  determined.  Stability  results 
show  that  the  cell  Reynolds  number,  uAx/  V  (where  u 
represents  the  local  velocity.  Ax  represents  the  grid 
spacing  and  v  is  the  kinematic  viscosity),  is  the 
appropriate  viscous  scale,  and  that  diffusive  effects  become 
dominant  at  cell  Reynolds  numbers  less  than  unity.  The 
transition  from  inviscid-  to  viscous -dominated  flows  thus 
depends  on  both  the  flowfield  and  the  grid.  Viscous  flows 
can  switch  from  convection-dominated  to  diffusion- 
dominated  because  of  increased  grid  resolution  or 
stretching.  The  second  reason  for  difficulty  arises  because 
the  presence  of  boundary  layers  at  high  Reynolds  numbers 
requires  high  asp)ect  ratio  grids  with  fine  resolution  normal 
to  the  walls.  Correspondingly,  there  are  two  cell  Reynolds 
numbers  of  widely  differing  magnitude.  The  one  based  on 
the  normal  grid  spacing  is  generally  diffusion  dominated, 
while  the  one  based  on  the  streamwise  spacing  is  generally 
convection-dominated.  The  issue  in  viscous 
preconditioning  is  to  deal  with  near-wall  cells  that  are 
viscously-dominated  in  one  direction  and  convectively- 
dominated  in  the  other,  while  simultaneously  treating 
convectively-dominated  cells  in  regions  away  from  the 
walls. 


viscous  time  step  and  has  pnoven  effective  in  many 
problems  [11].  Clearly,  this  corresponds  to  switching 
from  the  inviscid  to  the  viscous  value  when  the  Reynolds 
number  goes  below  unity.  The  most  appropriate  Reynolds 
.number  for  this  switch  is  the  cell  Reynolds  number. 

The  function  in  Eq.  30  can  likewise  be  made  to  merge 
smoothly  with  the  physical  properties  at  transonic 

conditions  by  noting  that  at  Mach  one,  =  c^,  so  that  if 
we  choose  kp  =  kp  =  y,  Eq.  30  degenerates  to  pp  =  pp  at 
Mach  one.  (In  computations  for  incompressible  flow  we 
have  generally  chosen  kp=kp  =  1.33).  The  remaining 
artificial  properties  can  be  made  continuous  by  setting 
hq<  =  kjp  =  0,  and  by  setting  A;,  =  1  and  =  Pr .  This 
latter  choice  does  not  precisely  satisfy  the  viscous 
matching  condition,  but  since  the  Prandtl  number  for  most 
gases  is  near  one,  it  is  close  enough  to  give  good  results. 
All  the  examples  we  give  are  based  on  Ais  combination  of 
artificial  properties. 

The  second  procedure  is  similar,  but  instead  of  using  a 
function  with  a  discontinuous  slope  for  pj,  we  make  both 
the  fimction  and  its  derivatives  continuous.  Here  we  define 
the  three  parameters  as: 


We  demonstrate  two  ways  in  which  the  limiting  forms  of 
the  artificial  properties  in  Table  I  can  be  combined  into  a 
single  function  that  can  be  used  over  the  full  Reynolds- 
Mach  number  domain.  The  parameter  pp  is  the  p)rimary 
quantity  in  controlling  eigenvalues,  and  we  begin  by 
considering  this  quantity.  The  simplest  procedure  is  to 
choose  Pp  as  the  minimum  of  the  viscous  and  inviscid 
values, 

Pp  =  (*p/V'2)M'»{l.Re}  (30) 

where  we  have  used  the  same  constant  at  both  conditions. 
This  is  equivalent  to  using  the  smaller  of  an  inviscid  or 


Table  I.  Preconditioning  Parameters  Dictated  by 
Reynolds  and  Mach  Number  Scaling 


Term 

Low 

Mach 

Number 

Low 

Reynolds 

Number 

Pp 

k'pRe^lV^ 

Pr 

krPT 

Rc 

K  hrIPr 

These  functions  reach  the  proper  limits  at  low  Reynolds 
numbers,  low  Mach  numbers,  and  at  high  Reynolds 
number,  transonic  conditions.  In  particular,  the  function 
for  switches  continuously  from  unity  to  1/Pr  as  the  cell 
Reynolds  number  goes  through  unity.  When  the  Mach 
number  approaches  unity,  Tp,  approaches  the  physical 
Jacobian,  dG/dQv*  preconditioned  equations 

become  identically  the  physical  equations.  Choosing  kp  = 
0,  as  in  the  first  example  gives  simpler  pre-conditioned 
equations,  but  only  causes  the  modified  eigenvalues  of  the 
equations  to  approach  the  physical  eigenvalues  as  the 
Mach  number  goes  to  unity.  The  equations  remain  distinct. 

In  summary  ,  we  scale  the  time  derivatives  at  high  cell 
Reynolds  numbers  to  keep  the  convective  eigenvalues 
well-conditioned,  whereas  at  low  cell  Re5molds  numbers, 
we  scale  so  that  the  equations  reduce  to  simple  diffusive 
equations.  We  also  scale  the  dominant  convective  speed  so 
that  it  is  the  same  order  as  the  diffusive  time-scale.  The  low 
Reynolds  number  scaling  causes  the  convective  terms  to 
berome  stiff,  but  because  they  are  small,  this  doesn't  slow 
convergence.  To  assess  this  scaling,  we  use  Fourier 
stability  theory  for  the  full  Navier-Stokes  equations  using 
Reynolds  number  and  Mach  number  as  parameters. 


6  STABILITY  AND  CONVERGENCE  OF  THE 
PRECONDITIONED  EQUATIONS 

We  begin  by  comparing  the  stability  characteristics  of  the 
two-dimensional  Euler  equations  with  and  without 
preconditioning  at  a  Mach  number  of  0.01  and  a  flow  angle 
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Figure  1:  Euler:  CD-ADI,  M=0.()1,  No  Precondi¬ 
tioning,  CFL=5 

of  45°.  Figure  1  shows  results  for  the  non-preconditioned 
case,  while  Fig.  2  is  for  the  case  with  preconditioning. 

Both  of  these  stability  predictions  are  for  central 
differencing  in  space  and  ADI  approximate  factorization  in 
time.  The  stability  results  without  preconditioning 
indicate  the  amplification  factor  is  nearly  unity  (0.9999) 
over  the  mid-  and  low-wave-number  regions,  thereby 
vividly  demonstrating  the  stifftiess  that  is  encountered  at 
low  speeds. 

By  contrast,  the  amplification  factors  in  Fig.  2  for  the 
preconditioned  case  are  quite  reasonable  with  damping  rates 
of  around  0.9  over  most  of  the  mid-  and  low-wave-number 
ranges  with  sharp  fall-off  along  the  axes  (except  at  the 
comers)  indicating  that  the  preconditioned  system  will 
provide  fast,  efficient  convergence  at  this  low  Mach 
number  condition.  We  do  note  that  the  amplification  factor 
goes  to  unity  in  all  four  comers,  but  these  peaks  are  easDy 
removed  by  a  small  amount  of  artificial  dissipation. 
Companion  stability  results  (not  shown)  indicate  that  this 
preconditioned  stability  result  is  independent  of  Mach 
number,  and  that  it  is  nearly  identical  to  that  for  the  non- 


Figure  3:  Euler:  LGS-4,  I/III,  M=0.01,  No  Precon¬ 
ditioning,  CFL=20 

preconditioned  equations  at  a  Mach  number  of  0.7, 
suggesting  that  convergence  with  the  preconditioned 
system  will  be  similar  to  the  efficient  convergence 
observed  with  the  non-preconditioned  system  at  high 
subsonic  Mach  numbers.  The  non-preconditioned 
eigenvalues  in  Fig.  1,  however,  indicate  that  this  case  will 
converge  very  slowly,  an  indication  that  if  verified  by 
computations.  This  demonstrates  the  ease  with  which  the 
stiffness  in  the  Euler  equations  can  be  removed. 

To  further  demonstrate  the  effectiveness  of  the 
preconditioning,  we  show  stability  results  for  similar 
conditions  in  Figs.  3  and  4,  except  that  upwind 
differencing  is  used  for  the  spatial  discretization  and  line 
Gauss-Seidel  approximate  factorization  is  used  for  the 
solution  procedure.  Figure  3  shows  the  non-preconditioned 
stability  results  for  M  =  0.01.  These  eigenvalues  again 
contain  an  unacceptable  stiffness  in  the  low-wave-number 
region.  This  stiffaess  is,  however,  removed  by  the 
preconditioning  as  shown  in  Fig.  4.  Again,  this 
preconditioning  renders  the  stability  results  essentially 


“x 


Figure  2:  Euler:  CD-ADI,  M=0.01,  With  Precondi¬ 
tioning,  CFL=5 


Figure  4:  Euler:  LGS-4,  I/III,  M=0.01,  With  Pre¬ 
conditioning,  CFL=20 
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Figure  5:  Effect  of  inviscid  preconditioning  on  the 
number  of  iterations  to  converge  to  machine  zero 
versus  the  Mach  number. 

independent  of  Mach  number,  and  gives  uniform 
convergence  at  all  Mach  numbers. 

The  significance  of  these  stability  results  is  easily  shown 
by  applying  them  to  a  simple  flowHeld  consisting  of 
inviscid  flow  in  a  straight  duct.  Figure  5  shows  the 
convergence  of  the  ADI  system  from  an  initial  condition 
corresponding  to  a  small  perturbation  from  the  exact 
(uniform  flow)  solution.  Although  this  problem  appears 
trivial,  the  return  to  uniform  flow  at  low  Mach  numbers 
takes  thousands  of  iterations  without  preconditioning,  but 
is  independent  of  Mach  number  with  preconditioning.  The 
number  of  iterations  required  for  convergence  without 
preconditioning  is  inversely  related  to  the  square  of  Mach 
number,  and  at  A/  =  10'^,  some  10^  iterations  are  required 
to  reach  convergence  to  machine  error.  When 
preconditioning  is  used,  the  number  of  iterations  required 
for  convergence  is  independent  of  Mach  number  and  is 
similar  to  the  number  required  for  the  non-preconditioned 
case  at  transonic  conditions.  The  actual  convergence  rates 
for  some  of  these  cases  are  shown  in  Fig.  6.  Similar 
preconditioned  and  non-preconditioned  results  are  observed 
for  the  line  Gauss-Seidel,  upwind  system.  Applications  to 
a  wide  range  of  practical  problems  have  been  demonstrated 
elsewhere  [11-14],  giving  ample  evidence  that  the  Euler 
equation  problem  is  well  in  hand. 
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Figure  6:  Convergence  of  the  inviscid  straight  duct 
case  at  various  Mach  numbers  using  the  original 
equations  and  the  preconditioned  equations. 


Figure  7:  Navier-Stokes  Eqns.:  CD-ADI,  M— 0.001, 
/(r,^,,.-0.1,  Viscous  Preconditioning,  CFL=5, 
VNN=5 

To  complete  the  stability  survey  for  the  Euler  equations,  we 
note  that  the  stability  results  for  the  artificial 
compressibility  version  of  the  incompressible  equations 
are  identical  to  the  low  Mach  number  results  in  Figs.  2  and 
4.  These  results  clearly  demonstrate  the  ability  of  the 
preconditioning  method  to  apply  equally  well  to 
compressible  and  incompressible  solutions. 

Stability  results  for  the  preconditioned  Navier-Stokes 
equations  are  given  on  Fig.  7.  These  results  are  for  the 
central-differenced  ADI  system  at  a  cell  Reynolds  number  of 
0.1  and  a  Mach  number  of  0.01.  (Note  results  for  high  cell 
Reynolds  numbers  are  identical  to  the  Euler  results  given  in 
Fig.  2.)  The  viscous  preconditioning  is  not  quite  as 
effective  in  controlling  the  stability  eigenvalues  as  in  the 
case  of  the  Euler  equations,  but  it  still  improves  the 
stability  map  dramatically  as  compared  to  non- 
pareconditioned  results.  Eigenvalues  over  most  of  the 
domain  are  around  0.9  with  some  increase  toward  the  higher 
wavenumbers  that  arises  because  of  the  absence  of  diffusion 
in  the  continuity  equation.  The  addition  of  artificial 
diffusion  in  continuity  eliminates  this  difficulty  and 
provides  good  viscous  convergence  as  is  shown  next. 
Comparison  with  stability  results  based  on  the  non- 
preconditioned  equations  or  preconditioning  with  p'p  set 
to  its  inviscid  value  rather  than  its  viscous  value  shows  a 
substantial  deterioration  in  eigenvalues  for  either  case. 
Without  preconditioning,  the  eigenvalues  become  very 
stiff,  and  while  inviscid  preconditioning  changes  the 
stability  eigenvalues,  it  doesn't  improve  them.  Clearly, 
viscous  preconditioning  is  needed  as  the  cell  Reynolds 
number  decreases. 

Figure  8  demonstrates  the  effectiveness  of  the  viscous 
preconditioning  for  the  Navier  Stokes  equations  for  a  second 
simple  problem,  that  of  fully  developjed  flow  in  a  pipje. 
Again,  the  initial  condition  correspxjnds  to  the  exact 
solution  plus  a  small  p>erturbation.  The  figure  shows  the 
number  of  iterations  required  to  converge  to  machine 
accuracy  for  cell  Reynolds  numbers  ranging  from  10"^  to 
10.  With  viscous  preconditioning,  convergence  is  seen  to 
be  indepiendent  of  cell  Reynolds  number  over  the  entire 
sp)ectrum.  Solutions  with  inviscid  preconditioning  show  a 
dramatic  slowdown  in  convergence  at  the  smaller  Reynolds 
numbers,  while  computations  with  no  preconditioning  were 
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Figure  8:  Convergence  of  the  viscous  straight  duct 
case  at  various  Cell  Reynolds  numbers  using  inviscid 
preconditioning  or  viscous  preconditioning. 


very  irregular,  and  frequently  did  not  converge.  The  flow 
Mach  number  for  these  calculations  is  taken  as  10'^. 

7  REPRESENTATIVE  SOLUTIONS  FOR 
PRACTICAL  PROBLEMS 

Thus  far  we  have  shown  how  to  transform  the  time 
derivatives  so  that  the  equations  are  well  conditioned  over 
the  entire  Reynolds-number  /  Mach-number  regime.  We 
have  then  used  stability  results  for  both  the  Euler  and  the 
Navier-Stokes  equations  to  verify  that  these  preconditioned 
equations  provide  effective  damping  factors.  In  addition,  we 
have  also  shown  that  the  preconditioning  equations  provide 
uniform  convergence  for  simple  problems  at  all  Reynolds 
numbers  and  Mach  numbers.  The  ultimate  proof  of 
convergence  enhancement  must,  however,  rest  upon 
demonstration  of  effectiveness  in  practical  problems.  Over 
the  past  several  years  we  have  applied  these  systems  to  a 
broad  variety  of  applications  including  low  speed 
compressible  flows,  combustion  problems,  incompressible 
flows,  supercritical  fluids  and  extrusion  modeling.  Space 
does  not  p)ermit  a  complete  demonstration  of  all  these 
examples,  but  we  present  some  representative  results  to 
demonstrate  the  capabilities. 

Figure  9  shows  results  for  laminar  flow  over  a  backstep  at  a 
Reynolds  number  of  200.  The  «-velocity  contours  are 
shown,  along  with  the  convergence  rate  with  the 
preconditioned  and  non-preconditioned  cases.  These 
computations  are  done  with  the  line  Gauss-Seidel  algorithm. 
Clearly,  viscous  preconditioning  provides  a  major 
enhancement  to  Ae  convergence  rate. 

As  a  second  example,  we  consider  the  flow  through  a 
converging  diverging,  rocket  nozzle.  The  turbulent 
boundary  layers  in  this  nozzle  are  very  thin  because  of  the 
high  Reynolds  number,  and  strong  wall  cooling.  The 
corresponding  strong  grid  stretching  (aspect  ratios  larger 
than  10^)  required  near  the  wall  introduces  important  low 
Reynolds  number  effects  in  this  otherwise  high  Reynolds 
number  flow.  With  standard  algorithms,  the  solution 
converges  at  a  reasonable  rate  for  about  four  orders  of 
magnitude  (which  would  appear  to  be  sufficient),  and 
switches  to  a  very  slow  rate  of  convergence.  With 
preconditioning,  the  convergence  continues  to  machine 
zero  at  a  rate  that  is  faster  than  the  initial  convergence  of  the 
non-preconditioned  solution.  The  heat  flux  to  the  wall  is 
shown  in  Fig.  10  as  a  function  of  axial  distance  for  both 
calculations  at  several  time  steps.  The  lower  plot  shows 


that  the  preconditioned  solution  gives  reasonable  results 
after  only  2(X)  iterations,  while  after  400  iterations,  the  heat 
flux  is  indistinguishable  from  the  machine-accuracy  results. 
The  standard  algorithm  produces  very  different  results.  After 
2000  iterations  (which  corresponds  to  three  orders  of 
magnitude  reduction  in  the  global  residuals),  the  wall  heat 
flux  is  only  about  half  its  final  converged  value,  and  it  takes 
more  than  20,000  time  steps  to  come  within  plotting 
accuracy  of  the  fully  converged  result  When  converged  to 
machine  accuracy,  both  the  standard  and  the  preconditioned 
algorithms  give  identical  results.  This  shows  that  low 
Reynolds  number  cells  near  the  wall  (which  determine  the 
wall  heat  flux)  can  also  totally  control  the  overall 
convergence  of  the  solution. 

As  a  final  example,  we  present  a  simulation  of  the  flow  in  a 
uni-element  gaseous  rocket.  The  flowfield  is  generated  by 
two  co-aimular  jets  entering  through  the  left  end  of  a 
cylinder  whose  diameter  is  50  mm.  The  gas  in  the  mner 
stream  is  oxygen,  while  that  in  the  outer  stream  is 
hydrogen.  The  outer  diameter  of  the  hydrogen  jet  is  12  mm, 
giving  a  38  mm  backstep  past  which  the  jets  exapand.  The 
diameter  of  the  oxygen  jet  is  8.4  mm,  the  hydrogen  annulus 
is  1  mm,  and  the  two  are  separated  by  a  sleeve  of  thicknces 
0.8  mm.  An  overall  picture  of  the  flowfield  is  given  in  Fig. 
11.  The  back  step  generates  a  large  recirculating  region  near 
the  outer  wall.  The  two  gaseous  streams  begin  to  mix  upon 
exiting  the  injector,  but  the  finite  thickness  of  the  sleeve 
generates  a  small  wake  in  which  the  hydrogen  and  oxygen 
first  start  to  mix.  This  mixing  region  is  very  important  to 
the  computation  because  it  acts  as  the  primary  flame- 
holding  mechanism  for  the  resulting  diffusion  flame 
(although  the  present  results  are  for  non-reacting  flow). 

Initial  attempts  at  computing  this  flowfield  with 
preconditioning  showed  very  poor  convergence.  Although 
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Figure  9:  Contours  of  velocity  and  convergence  for 
the  backward-facing  step  at  a  Reynolds  number  of 
100  using  the  four-sweep  Line  Gauss-Seidel  scheme. 
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Standard  Algorithm 


Enhanced  Algorithm 


Figure  10:  Temporal  convergence  of  the  heat  flux 
along  the  nozzle  wall  for  both  the  standard  algo¬ 
rithm  and  the  enhanced  algorithm. 


the  non-preconditioned  system  did  not  converge  well,  it  did 
converge  slightly  better  than  the  preconditioned  one. 
Careful  investigation  revealed  that  the  reason  for  the 
convergence  difficulty  was  because  a  steady  solution  failed 
to  exist  The  resulting  flowfield  was  oscillatory  in  nature  as 
determined  by  experimental  observations.  In  addition,  the 
computations  showed  the  unsteadiness  increased  in  strength 
as  the  grid  was  refined.  The  reason  the  preconditioned 
system  showed  poorer  convergence  was  that  it  introduced  a 
smaller  amount  of  artificial  dissipation  than  did  the  non- 
preconditioned  system.  (All  computations  were  run  with 
upwind  flux  difference  splitting.)  The  velocity  contours  in 
the  resulting  unsteady  solution  is  presented  in  Fig.  1 1  at 
three  different  instants  of  time. 

The  source  of  the  unsteadiness  appears  to  originate  in  the 
recirculating  zone  in  the  wake  of  the  finite  thickness  sleeve 
between  the  two  inlet  streams.  A  close-up  view  of  this 


Figure  11:  Unsteady  Velocity  Field  near  Injector 
Post  for  Hydrogen/Oxygen  at  0/F  =  4. 


region  is  given  in  Fig.  12.  The  details  show  that  in  this 
recirculating  region,  the  heavier  oxygen  from  the  lower 
stream  makes  up  the  jjrimary  content  of  the  recirculating 
region.  The  hydrogen  mixes  with  the  oxygen  only  along 
the  upper  side  of  the  recirculating  region.  When  the 
recirculation  region  sheds  a  vortex,  it  induces  a  substantial 
unsteadiness  in  the  lighter  hydrogen  stream,  which  is  then 
propagated  into  the  recirculation  region  downstream  of  the 
backstop  so  that  the  entire  flowfield  oscillates  in  response 
to  this  narrow  wake  region. 

Plots  of  the  time  rate  of  change  of  the  velocity  at  a  particular 
point  near  the  wake  region  are  given  on  Fig.  13.  Even  in 
the  unsteady  solution,  the  preconditioned  results  show 
larger  amplitudes  than  do  the  non-preconditioned  solutions. 
This  is  again  because  of  a  diminished  amount  of  artificial 
damping.  The  impacts  of  increased  artificial  dissipation  are 
shown  by  the  first-order  upwind  results  which  are  ne^ly 
steady.  Comparisons  with  analytical  solutions  for  simple 
shear  layers  indicate  that  the  preconditioned  results  are  more 


Figure  12:  Velocity  Vector/Streamline  Field  near  In¬ 
jector  Post  for  Hydrogen/Oxygen  Calculation 
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One  issue  with  regard  to  preconditioned  systems  is  their 
impact  on  code  robustness.  We  have  encountered  many 
experiences  in  which  preconditioning  improves  robustness 
(i.e.,  cases  where  the  non-preconditioned  code  fails  to 
converge,  while  preconditioning  makes  convergence  very 
reliable).  It  must,  however,  be  recognized  that 
preconditioning  increases  the  local  time  step  dramatically 
and  this  large  time  step  may  require  some  restriction  at  early 
stages  of  the  computation  (although  the  restricted  time  step 
may  still  be  larger  than  the  corresponding  non- 
preconditioned  time  step).  The  present  example,  however, 
demonstrates  that  there  are  some  cases  for  which  the 
preconditioning  may  not  improve  convergence  because  a 
steady  solution  does  not  exist.  In  these  cases,  the 
preconditioned  system  proves  its  worth  in  an  unsteady, 
iterative  solution  procedure. 

8  CONCLUSIONS 

The  proper  limiting  forms  of  the  equations  of  motion  at 
low  speeds  and  in  diffusion-dominated  regions  have  been 
obtained  by  perturbation  expansions  and  used  as  the  basis 
for  defming  a  preconditioning  matrix  for  convergence 
enhancement.  The  expansion  results  show  that  pp  is  the 
most  important  variable  in  controlling  convergence  while 
Pf  and  hp  are  of  secondary  importance.  Convergence 
control  can  be  obtained  by  replacing  these  physical 
derivatives  by  artificial  ones  in  the  time  derivatives,  while 
retaining  the  physical  quantities  in  the  flux  terms  so  the 
solutions  are  unchanged.  Appropriate  replacement  terms 
for  these  quantities  obtained  from  the  expansion  procedures 
are  then  generalized  so  that  they  ap>proach  the  physical 
quantities  in  the  transonic  and  supersonic  regimes. 

Following  the  development  of  a  generalized  preconditioner 
that  ensures  that  the  condition  number  of  the  Jacobian 
matrices  of  the  equations  of  motion  remain  of  order  one  at 
all  Mach  numbers,  the  resulting  convergence  characteristics 
are  First  checked  by  means  of  stability  theory.  The 
effectiveness  of  the  methods  is  then  verified  by 
computations  of  a  variety  of  problems,  starting  first  with 
simple  applications  and  then  going  to  practical  examples. 
Efficient,  uniform  convergence  is  demonstrated  for  a  variety 
of  applications  covering  a  range  of  Reynolds  and  Mach 
number  conditions.  Overall,  it  is  demonstrated  that 
convergence  enhancement  of  the  Euler  equations  at  low 
speeds  is  quite  easy  and  can  be  readily  ensured.  Extension  to 
the  Navier-Stokes  equations  requires  more  care,  but  the 


Figure  13:  Time  History  of  Axial  Velocity  at  one 
point  in  the  Injector  Flowfield  for  various  differenc¬ 
ing  schemes 


present  procedure  provides  much  improved  convergence 
rates  in  some  of  the  traditional  problem  areas  for  time¬ 
marching  methods,  while  having  no  detrimental  effects  in 
regimes  where  the  methods  already  work  efficiently. 
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Abstract 

Implementation  issues  eissociated  with  the  applica¬ 
tion  of  Krylov  subspace  iterative  methods,  such  as 
Newton-GMRES,  are  presented  within  the  frame¬ 
work  of  practical  CFD  applications.  This  paper  will 
categorize,  evaluate  and  contrast  the  major  ingre¬ 
dients  (function  evaluations,  matrix-vector  products 
and  preconditioners)  of  Newton-GMRES  Krylov  sub¬ 
space  methods  in  terms  of  their  effect  on  the  local 
linear  and  global  nonlinear  convergence,  memory  re¬ 
quirements,  and  accuracy.  The  discussion  will  focus 
on  Newton-GMRES  in  both  a  structured  multi-zone 
incompressible  Navier-Stokes  solver  and  an  unstruc¬ 
tured  mesh  finite-volume  Navier-Stokes  solver.  Ap¬ 
proximate  vs.  exact  matrix-vector  products,  effective 
preconditioners  and  other  pertinent  issues  will  be  ad¬ 
dressed. 

1  Introduction 

Interest  in  iterative  methods  in  CFD  has  been  mo¬ 
tivated  not  only  by  the  requirement  for  better  con¬ 
vergence  and  speed  of  numerical  codes,  but  also  by 
the  availability  of  faster,  larger  memory  serial  and 
parallel  machines.  The  coupling  of  Newton’s  method 
with  iterative  solvers  is  an  effective  approach  for  solv¬ 
ing  the  large  systems  of  nonlinear  equations  which 
arise  from  discretized  forms  of  the  Euler  and  Navier- 
Stokes  equations.  One  of  the  main  motivations  for 
the  use  of  Newton’s  method  is  the  possibility  of  su- 
perlinear  (and  in  some  cases  quadratic)  asymptotic 
convergence.  References  [7,  21]  are  examples  of  suc¬ 
cessful  implementations  of  exact  Newton’s  method 


for  two-dimensional  Navier-Stokes  codes.  Most  of  the 
conventional  implicit  schemes  used  today  are  effec¬ 
tively  approximate-Newton  methods.  The  approxi¬ 
mations  appear  in  the  form  of  simplifications  in  the 
functional  Jacobian  or  some  form  of  under/over  relax¬ 
ation  strategy,  see  e.g.  [11]  or  [1.5].  In  practice  these 
simplifications  are  employed  for  reasons  such  as  effi¬ 
ciency,  implementation  ease,  or  non-analyticity  of  op¬ 
erators  (e.g.,  discrete  limiters  in  differencing  schemes 
based  on  Riemann  solvers).  Over  a  wide  range  of 
numerical  methods  developed  for  the  solution  of  the 
multidimensional  Navier-Stokes  equations,  the  rigor¬ 
ous  application  of  Newton’s  method  would  require 
the  inversion  of  a  large  block  banded  matrix,  which 
even  by  today’s  standards,  poses  many  obstacles  in 
terms  of  memory  requirements  and  speed.  An  alter¬ 
native  to  direct  matrix  inversion  is  the  use  of  itera¬ 
tive  matrix  solution  methods.  In  particular,  the  class 
of  Krylov  subspace  methods  known  as  GMRES  [19] 
will  be  considered.  Wigton  [23]  was  the  first  to  suc¬ 
cessfully  implement  GMRES  for  a  two-dimensional 
Navier-Stokes  code. 

The  difficulties  associated  with  iterative  methods 
such  as  GMRES  lie  in  the  rapid  expansion  of  mem¬ 
ory  requirements  inherent  in  the  embedded  Arnoldi 
process  (storing  the  Krylov  subspace  vectors),  the 
need  to  perform  the  matrix  vector  “Tp”  products 
(which  sometimes  requires  the  storage  of  the  matrix 
A),  and  the  preconditioning  of  the  system  of  equa¬ 
tions  by  some  approximate  inverse  of  A  to  improve 
the  convergence  of  GMRES. 

The  purpose  of  this  paper  is  to  focus  on  the  im¬ 
plementation  details  and  specific  results  from  the 
application  of  Newton-GMRES  to  both  a  struc- 
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tured  multi-zone  incompressible  Navier-Stokes  code 
(INS2D,  Rogers  [17])  and  an  unstructured  mesh 
Navier-Stokes  code  due  to  Barth  [5].  We  use  these 
two  codes  as  case  studies,  but  the  lessons  learned,  ap¬ 
proximations  assumed,  results  obtained,  and  general 
conclusion,  are  applicable  to  most  implementations 
of  Newton-GMRES  to  systems  of  PDE’s. 

2  General  Formulation 

In  a  general  form,  we  cast  the  discrete  approximation 
to  the  steady  multi-dimensional  Navier-Stokes  equa¬ 
tions  as 

^(Q)  ~  Qj -l,kt 

Qj,k  )Qj-\-l,k)Qj,k~l^'''}  (f) 

where  we  are  representing  the  support  of  the  oper¬ 
ator  as  involving  neighboring  points  in  a  computa¬ 
tional  mesh  and  Q  is  the  solution  variable  (typically 
the  conserved  variables).  Although  this  representa¬ 
tion  appears  in  a  structured  mesh  form  and  is  rather 
compact  (involving  only  three  points  in  each  compu¬ 
tational  direction),  we  intend  it  to  also  represent  an 
unstructured  mesh  template  and  possibly  higher  or¬ 
der  -  higher  dimensional  -  more  broadband  support. 
We  shall  refer  to  this  as  the  EYinction  Evaluation 
step  in  the  overall  process.  For  fixed  point  solutions 
(steady-state)  we  require  the  solution  of  7J(Q)  =  0. 

A  time  accurate  approach  to  the  solution  assumes 
the  form  of 


super-linear  or  quadratic  convergence  depending  on 
the  characteristics  of  the  Newton  solver,  e.g.[21]  or 
[5].  It  is  convenient  to  recast  Eq.  3  in  the  general 
form 

b->lx  =  0,  (4) 

where  b  =  7i(Q),  x  =  AQ  and  is  a  matrix  opera¬ 
tor. 

The  numerical  process  involved  in  solving  Eq‘  3  will 
be  referred  to  here  as  the  Inner  Iteration  at  a  par¬ 
ticular  step  n.  The  overall  iteration  of  the  nonlinear 
system  will  be  referred  to  as  the  Outer  Iteration 
.  There  are  a  number  of  successful  approaches  to 
the  Inner  Iteration.  In  the  crise  of  structured  mesh 
applications,  A  represents  a  sparse  block  banded  ma¬ 
trix  which  can  be  solved  with  various  methods  such 
as  point  or  line  relaxation  [17]  or  approximate  fac¬ 
torization,  e.g.[14].  In  the  unstructured  mesh  case, 
A  may  not  have  a  simple  underlying  structure,  but 
the  Inner  Iteration  can  be  successfully  solved  with 
a  wide  variety  of  relaxation  techniques  [22],  [5].  For 
the  present  discu.ssion  we  shall  focus  on  the  GMRES 
Krylov  projection  technique  for  solving  the  Inner  It¬ 
eration. 

The  GMRES  (Generalized  Minimal  RESidual) 
method  was  introduced  by  Saad  and  Schultz  [19] 
for  solving  large  sparse  systems  of  linear  equations. 
The  GMRES  algorithm  is  a  Krylov  subspace  method 
where  given  a  matrix  A  €  a  vector  v  G  3?^ 

and  an  integer  m  >  1,  the  Krylov  subspace  a.ssociated 
with  A,  V  and  rn  is  defined  as 


^+72(Q)  =  0  (2) 

at 

which  can 

either  represent  the  artificial-compre.s,sibility  scheme 
for  the  incompressible  Navier-Stokes  equations  [17]  or 
the  full  Navier-Stokes  unstructured  mesh  scheme  [5]. 
Applying  implicit  Euler  time  differencing  with  the 
usual  Taylor  .series  linearization  in  time  we  have 


■  D 
At 


dn(Q) 

dQ 


AQ  =  TZiQr 


(3) 


with  AQ  =  Q"+'  -  Q",  the  Jacobian  {A) 

of  the  Function  Evaluation  ,  7^(Q),  and  D  a  pos¬ 
itive  diagonal  matrix.  For  At  oo  this  is  exactly 
Newton’s  method  and  for  finite  At  a  relaxed  form  of 
Newton’s  method.  In  many  applications  of  Newton¬ 
like  methods  to  the  Euler  and  Navier-Stokes  e(|ua- 
tions  this  time-like  relaxation  is  used  to  start  the  so¬ 
lution  process.  A  finite  time  step  At  is  used  initally 
to  get  past  the  rather  violent  nonlinear  startup  and 
then  increa-sed  to  At  oc  leading  to  rai)id  linear. 


I<m(A,v)  =  span{v,Av,A‘^v,- ■  ■  ,A"'  ’u).  (5) 

In  the  GMRES  algorithm  an  initial  guess  Xo  to  the 
.solution  of  the  linear  sy.stern  is  given  from  which  the 
initial  residual  is  defined 


ro  =  b->lx().  (fi) 

'I'he  GMRES  method  then  attempts  to  find  G 
KmiA,r())  such  that  the  residual  vector  b  —  >I(xo  -|- 
Zrn)  is  small  .  'I'liis  is  done  so  that  at  each  iteration 
the  residual  norm  is  minimized.  On<!  important  pa¬ 
rameter  for  the  GMRES  method  is  the  size  of  the 
subspace  rn.  As  rn  incr(;a.ses,  the  memory  increases 
linearly  and  the  com[)utation  quadratically.  'Flu;  pa¬ 
rameter  rn  is  usually  cho.sen  ba.s<Ml  on  storage  require¬ 
ments  and  effectiverxiss  of  the  IniK^r  Itm-atioii.  In 
the  discussion  below,  we  will  hav<!  more  spricific  things 
to  say  about  this  re()uirement  and  it’s  effect  on  tlx; 
overall  proce.ss.  S'o  avoid  the  increasing  rrxurx)ry  and 
com[)utation  requirements  with  increasing  rn,  a  com¬ 
mon  modification  of  GMRES  is  to  ai)|)ly  restarts.  An 
upfxtr  Ijoiind  r/t,.  on  rn  is  chosen  arxl  if  convergerxx;  is 
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not  reached,  the  Krylov  subspace  process  is  restarted 
with  the  current  residual  replacing  vq.  In  this 
case,  the  memory  requirements  are  traded  off  against 
the  convergence  of  the  Inner  Iteration  process,  and 
this  will  definitely  affect  the  Outer  Iteration  . 

Numerical  experience  has  shown  that  success  or 
failure  of  GMRES  hinges  critically  on  adequate  pre¬ 
conditioning  of  the  linear  systems  to  be  solved.  A 
preconditioning  matrix  M  is  usually  applied  in  ei¬ 
ther  a  left-preconditioning  M(b  —  ^x)  =  0  or  right¬ 
preconditioning  b  —  AMy  =  0  (with  y  =  M“*x) 
fashion.  Ideally  M  should  be  chosen  to  be  an  approx¬ 
imation  to  A~^.  Although,  the  most  successful  and 
popular  form  of  M  appears  to  be  ILU  [13]  (Incom¬ 
plete  Lower-Upper  Factorization),  we  will  also  con¬ 
sider  alternate  preconditioners  in  the  next  section. 

The  important  ingredients  of  the  Newton-GMRES 
method  which  we  will  focus  on  in  this  paper  are  the 
Ap  products  required  to  form  the  Krylov  subspace 
vectors  K,„(A,i;),  the  choice  of  the  preconditioner 
M,  the  size  of  the  subspace  m  and  restart  size  rrir, 
and  the  storage  requirement  influenced  by  all  these 
factors.  We  will  attempt  to  put  the  various  trade  offs 
in  terms  of  memory  requirements,  convergence  and 
efficiency  in  perspective,  (in  particular  for  the  two 
approaches  discussed  here,  but  also  in  general). 

3  Structured  Mesh  Incom¬ 
pressible  Navier- Stokes 

Rogers  [17]  has  implemented  the  Newton-GMRES 
algorithm  into  a  two-dimensional  incompressible 
Navier-Stokes  code  (INS2D)  and  has  made  some 
significant  comparisons  with  the  conventional  tech¬ 
niques  of  implicit  point  and  implicit  Gauss-Seidel 
line  relaxation.  The  INS2D  flow  code  [18]  solves 
the  Reynolds- averaged  incompressible  Navier-Stokes 
equations  using  the  method  of  artificial  compressibil¬ 
ity,  [9].  It  is  capable  of  handling  multiple-zone  struc¬ 
tured  grids  using  either  a  patched  multi-block  (point- 
wise  continuous)  interface,  or  an  overlaid  (chimera) 
interface  between  zones.  The  boundary  conditions 
at  the  physical  boundaries  and  at  zonal  boundaries 
are  applied  in  an  implicit  fashion  during  the  solution 
process.  A  third-order,  upwind-differencing  scheme 
based  on  the  method  of  Roe  [16]  is  used  to  descritized 
the  convective  terms,  and  the  viscous  terms  are  dif¬ 
ferenced  using  second-order  central  differences.  The 
system  of  equations  is  integrated  in  pseudo-time  us¬ 
ing  an  implicit  Euler  time  discretization.  Typically, 
the  time  step  is  set  to  infinity  (10®)  which  results  in  a 
Newton’s  method  approach  where  the  implicit  point 
or  line  relaxation  schemes  are  used  for  the  Inner  It¬ 


eration  or  more  specific  to  this  paper,  GMRES  is 
used  for  the  Inner  Iteration. 

In  the  current  implementation,  the  Jacobian  A  is 
formed  based  on  a  first-order  differencing  of  the  con¬ 
vective  terms,  whereas  third-order  differencing  is  used 
for  7^(Q).  In  addition,  approximate  Jacobians  of 
the  Roe  flux  differences  from  the  upwind-differencing 
scheme  are  used  in  the  definition  of  .4,  see  [1]  for  more 
details.  The  first-order  difference  operator  is  used 
to  reduce  the  bandwidth  of  the  resulting  A  matrix, 
which  has  lower  memory  and  computational  require¬ 
ments  for  the  solution  of  Eq.  3.  However,  this  use  of 
approximate  Jacobians  can  also  slow  the  convergence 
to  a  steady  state,  that  is,  the  Outer  Iteration  non¬ 
linear  Newton  process  is  affected. 

The  GMRES  implementation  is  preconditioned  us¬ 
ing  block  ILU(O)  [13]  and  the  matrix  A  is  stored 
so  that  Ap  products  can  be  efficiently  formed  and 
the  ILU  process  streamlined.  For  comparison,  block 
point  relaxation  and  block  line  relaxation  are  used 
as  both  the  Inner  Iteration  solver  and  as  precon¬ 
ditioners  for  the  GMRES  Inner  Iteration  process. 
Including  a  subspace  size  typically  on  the  order  of 
m  =  10  leads  to  additional  storage  requirements  as 
discussed  below  which  are  somewhat  of  a  burden  in 
two  dimensions  and  would  be  a  significant  hindrance 
in  three  dimensions.  The  use  of  the  approximate  Ja¬ 
cobian  (due  to  the  first  order  form  and  the  lineariza¬ 
tion  errors  associated  with  the  Roe  solver)  produces 
an  approximate  Newton’s  method  and  therefore  lin¬ 
ear  convergence  is  realized  as  opposed  to  the  potential 
for  quadratic  convergence. 

Rogers  [17]  examines  a  wide  range  of  cases  and 
options  in  his  paper  on  the  Newton-GMRES  imple¬ 
mentation.  Table  1  shows  the  characteristics  of  the 
cases  presented  and  itemizes  the  costs  of  the  vari¬ 
ous  schemes  for  each  case  broken  down  by  the  fun¬ 
damental  steps  in  the  algorithm.  Base  memory  (B 
MW)  includes  all  overhead  storage  for  the  algorithm 
including  memory  for  either  L  (line  relaxation),  P 
(point  relaxation)  or  the  Ap  product  in  G  (GMRES) 
(w  76  words/point).  The  additional  memory  (A 
MW)  is  composed  of  subspace  size  («  3  x  (m  -f  4) 
words/point)  and  preconditioner  («  9  words/point) 
contributions  for  GMRES.  The  timings  are  in  ms/pt 
;  milliseconds/point  to  convergence,  (maximum  di¬ 
vergence  «  10“®).  The  standard  approaches  of  point 
relaxation  and  line  relaxation  are  compared  directly 
with  the  Newton-GMRES  scheme  and  are  also  as¬ 
sessed  as  preconditioners  for  Newton-GMRES.  The 
first  few  cases  are  for  a  NACA  4412  airfoil  an  an¬ 
gle  of  attack  o;  =  13.87®  and  a  Reynolds  number, 
Re  =  1.5  X  10®  and  are  computed  on  a  set  of  refined 
grids.  The  multi-element  case  is  a  three  element  air- 
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foil  at  a  =  S’*  and  i2e  =  9  x  10®,  a  schematic  of  the 
grid  system  is  shown  in  Figure  1. 

Figures  2,  3,  4,  5  show  comparisons  for  the  above 
cases,  where  the  symbols  represent  every  50  Outer 
Iteration  .  It  seems  obvious  from  these  results  that 
the  GMRES-ILU  combination  is  the  more  efficient  in 
terms  of  computation  time.  On  the  other  hand,  the 
negative  aspect  of  the  GMRES-ILU  combination  is 
the  memory  requirements.  By  examining  the  trade 
offs  between  CPU  time  to  convergence  and  memory 
requirements  optimal  choices  can  be  made. 

For  example.  Figure  2  shows  the  effect  on  CPU 
time  to  convergence  for  various  choices  of  m.  A  sub¬ 
space  size  m  =  10  seems  to  be  optimal  in  terms 
of  computational  costs,  including  reasonable  added 
memory  requirements.  Also,  note  that  for  the  con¬ 
verging  cases  of  m  =  10,  20,40,  it  required  50  Outer 
Iteration  to  reach  the  same  level  of  convergence 
(CPU  times  are  larger  reflecting  the  added  computa¬ 
tional  costs  of  a  larger  subspace  size).  This  is  not  sur¬ 
prising  since  the  inexact  Jacobian  used  in  this  scheme 
limits  the  Inner  Iteration  process  to  linear  conver¬ 
gence.  Therefore,  after  some  point,  it  does  not  pay 
to  converge  the  Inner  Iteration  past  some  toler¬ 
ance  level  without  incurring  additional  cost  in  terms 
of  CPU  time  and  memory. 


Iteration  tolerance  level  e  thereby  solving  the  GM- 
RES  step  more  accurately.  In  this  case,  to  reach 
a  certain  convergence  criteria,  e.g. Outer  Iteration 
residual  to  10“®,  in  the  least  number  of  iteration,  re¬ 
quires  decreasing  e,  e  =  10“®  gets  there  in  50  Outer 
Iteration  .  On  the  other  hand,  the  CPU  time  cost 
(shown  in  milliseconds/point)  and  average  subspace 
size  m  (which  leads  to  addition  memory  requirements 
proportional  to  m)  indicate  that  a  loose  tolerance, 
say  e  «  10“^  and  small  m  produce  the  most  efficient 
combination.  This  leads  to  m  =  10  as  the  optimal 
choice  both  in  terms  of  CPU  efficiency  and  memory 
requirements. 

Memory  estimates  for  the  three-dimensional  code 
INS3D  include  a  base  memory  of  146  words/point, 
additional  GMRES  memory:  4  x  (m4-4)  words/point 
and  preconditioner  memory  of  16  words/point. 
Thus  for  GMRES(10)-|-ILU(0)  the  total  memory 
is  218  words/point.  Examples  include  a  simple 
wing:  0.2  million  points  (Mpoints);  43.6  MW,  a 
wing-fslat-bflap:  1.6  Mpoints:  349  MW,  and  a  C17 
Aircraft:  25  Mpoints:  5450  MW  =  5.45  GW.  These 
requirements  are  excessive  in  three-dimensions  and 
need  to  be  reduced  if  these  codes  are  to  be  used  in 
practice. 


Case 

Method 

B  MW 

A  MW 

ms/pt 

Airfoil 

L(5) 

0.28 

0.006 

1.98 

Grid  1 

P(20) 

0.28 

0.001 

2.31 

119x31 

G(10)-fLR 

0.28 

0.161 

3.17 

G(10)+PR 

0.28 

0.156 

2.12 

G(10)-fILU 

0.28 

0.188 

1.14 

Airfoil 

L(10) 

1.10 

0.011 

3.68 

Grid  2 

P(20) 

1.10 

0.002 

5.13 

237x61 

G(5)-hL 

1.10 

0.618 

3.77 

G(10)+P 

1.10 

0.609 

4.48 

G(10)+ILU 

1.10 

0.737 

1.45 

Airfoil 

4.35 

0.023 

8.79 

Grid  3 

4.35 

0.004 

12.56 

473x121 

4.35 

2.920 

3.91 

Multi- 

L(10) 

5.17 

0.015 

49.7 

Element 

P(20) 

5.17 

0.003 

14.7 

68K  pts 

G(10)+ILU 

5.17 

3.468 

5.37 

Table  1:  Cost  comparisons  of  iterative  methods  for 
INS2D  for  various  cases  and  schemes.  L(n):  Line 
Relaxation  for  n  iterations,  P(n):  Point  Relaxation 
for  n  iterations,  G(m)-|-  X  :  GMRES  with  subspace 
size  m  using  scheme  X  for  preconditioner. 


Figure  1:  Grid  around  a  three-element  airfoil. 


Figure  6  shows  the  effect  of  decreasing  the  Inner 
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Figure  2;  Results  for  Grid  1  showing  effect  of  sub¬ 
space  size  GMRES(m)  on  Newton-GMRES. 


Figure  3;  Results  for  Grid  1  Comparing  Various 
Schemes. 

4  Unstructured  Mesh  Com¬ 
pressible  Navier-Stokes 

Barth  [5]  has  implemented  the  Newton-GMRES  al¬ 
gorithm  into  a  two  and  three  -dimensional  unstruc¬ 
tured  mesh  Navier-Stokes  approach.  In  this  case  the 
flow  equations  are  solved  using  an  edge-based  un¬ 
structured  mesh  quadrature  scheme  characterized  as 
an  approximate  and/or  exact  Roe  Riemann  solver 
based  on  piecewise  polynomial  reconstruction,  this 


Figure  4:  Results  for  Grid  2  Comparing  Various 
Schemes. 


Figure  5:  Results  for  Multi-Element  Airfoil  Compar¬ 
ing  Various  Schemes. 


defines  the  Function  Evaluation  .  Details  of  the 
flow  algorithm  can  be  found  in  Barth  [6,  2,  3].  The 
details  of  the  Newton-GMRES  implementation  in¬ 
clude  exact  Ap  products  for  the  second  order  dis¬ 
cretization  with  a  first  or  second  order  approximate  A 
used  only  to  construct  the  ILU  preconditioner.  Barth 
presents  three  methods  to  compute  Ap  products,  one 
in  which  the  exact  Jacobian  is  stored  (requiring  a 
significant  increase  in  memory  requirements),  a  nu¬ 
merical  evaluation  using  Frechet  derivatives  [8]  which 
is  a  matrix-free  approach,  and  another  matrix  free 
approach  using  an  exact  product  form  where  proper 
linearization  of  the  Riemann  solvers  and  the  recon- 
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Figure  6:  Comparison  for  various  Inner  Iteration 
tolerance  levels  for  Grid  2  case.  Tolerance  levels  are 
given  along  with  milliseconds  per  point  to  conver¬ 
gence  and  average  subspace  size  used  m. 


struction/quadrature  mechanism  of  the  residual  vec¬ 
tor  assembly  are  used  to  produce  the  Ap  product. 
Since  exact  Ap  products  are  used,  quadratic  conver¬ 
gence  can  be  realized.  The  preconditioned  Inner  It¬ 
eration  is  also  fairly  efficient,  employing  a  subspace 
size  on  the  order  of  rur  =  12  and  a  modest  number  of 
restarts.  The  resulting  scheme  can  be  mapped  very 
successfully  onto  a  parallel  processor  environment. 

Figures  7,8,9,10  show  an  example  computation  for 
viscous  flow  with  turbulence  about  the  multiple- 
element  airfoil  geometry.  This  geometry  has  been  tri¬ 
angulated  using  the  Steiner  triangulation  algorithm 
described  in  [4],  see  Figure  7.  The  mesh  contains  ap¬ 
proximately  22,000  vertices  with  cells  near  the  airfoil 
surface  attaining  aspect  ratios  greater  than  1000:1. 
This  example  provides  a  demanding  test  case  for  CFD 
algorithms.  The  experimental  flow  conditions  are 
Moo  =  -20,  a  =  16°,  and  a  Reynolds  number  of  9“®. 
Experimental  results  are  given  in  [20]  and  computed 
results  are  shown  in  Figure  8.  Even  though  the  wake 
passing  over  the  main  element  is  not  well  resolved,  the 
surface  pressure  coefficient  shown  in  Figure  9  agrees 
quite  well  with  experiment. 

The  convergence  history  shown  in  Figure  10  is  typ¬ 
ical  for  aerodynamic  high  lift  computations. 

Some  of  the  more  practical  aspects  from  Barth’s  [5] 
implementation  of  Newton-GMRES  are  discussed  be¬ 
low. 


Figure  7:  Multi-element  airfoil  triangulation,  22,000 
vertices. 


Figure  8:  Multi-element  airfoil  solution  isomach  con¬ 
tours,  Moo  =  0.2,  a  =  16.0°,  Re  —  9.0  million. 


4.1  Storage  Requirements 

In  practice  we  will  be  solving  systems  of  /  coupled 
equations  so  that  each  nonzero  entry  of  the  matrix 
is  actually  a  small  /  x  /  block.  The  schemes  em¬ 
ployed  require  data  from  distance-one  neighbors  in 
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x/c 

Figure  9:  Comparison  of  computational  and  experi¬ 
mental  surface  pressure  coefficients. 


Figure  10:  Solution  convergence  history. 


the  graph  (mesh).  In  addition,  the  higher  order  accu¬ 
rate  schemes  require  distance-two  neighbors  in  build¬ 
ing  the  scheme,  see  Barth  [5,  3,  6].  First  consider  the 
situation  in  which  the  scheme  requires  only  distance- 
one  neighbors.  The  number  of  nonzero  entries  in  each 
row  of  the  matrix  is  related  to  the  number  of  edges  in¬ 
cident  to  the  vertex  associated  with  that  row.  Equiv¬ 
alently,  each  edge  e(vi,Vj)  will  guarantee  nonzero  en¬ 


tries  in  the  i-th  column  and  j-th  row  and  similarly 
the  j-th  column  and  i-th  row.  In  addition,  nonzero 
entries  will  be  placed  on  the  diagonal  of  the  matrix. 
From  this  counting  argument  we  see  that  the  number 
of  nonzero  block  entries,  nnz,  in  the  matrix  is  exactly 
twice  the  number  of  edges  plus  the  number  of  vertices, 
2E-\-N  (approximately  7A^  in  2D).  Table  2  (based  on 
a  similar  counting  argument)  shows  approximate  re¬ 
quirements  for  storing  distance-one  and  distance-two 
neighboring  information  as  a  sparse  matrix. 

Note  that  the  entries  of  the  sparse  matrix  asso¬ 
ciated  with  Newton’s  method  (for  solution  of  the 
Navier-Stokes  equations  and  an  associated  1  equa¬ 
tion  turbulence  model)  are  actually  small  5x5  and 
6x6  blocks  in  two  and  three  dimensions  respectively. 
At  first  glance,  this  storage  requirement  appears  pro¬ 
hibitively  large.  While  this  may  be  true  to  some  ex¬ 
tent  today,  the  memory  capacity  of  computers  is  ex¬ 
panding  at  a  rapid  rate.  It  is  quite  reasonable  to  ex¬ 
pect  that  in  the  foreseeable  future  sufficient  memory 
will  be  available  for  solving  most  problems  of  engi¬ 
neering  interest.  Even  so,  it  is  possible  to  reduce,  and 
in  some  cases  eliminate,  the  explicit  storage  of  the 
Jacobian  matrix  without  compromising  the  favorable 
convergence  characteristics  of  Newton’s  method. 


Dim. 

nnz  (Distance-1) 

nnz  (Distance-2) 

2 

IN 

19N 

3 

UN 

55A 

Table  2:  Storage  Estimates  for  Sparse  Matrices. 


4.2  Calculating  Analytic  Jacobian 
Derivatives 

In  this  section  we  address  the  task  of  computing  Jaco¬ 
bian  derivatives  for  Newton’s  method.  In  the  follow¬ 
ing  section  we  consider  the  related  task  of  multiplying 
an  arbitrary  vector  by  the  Jacobian  matrix. 

A  major  task  in  the  overall  calculation  of  the  Ja¬ 
cobian  derivatives  for  the  finite-volume  discretization 
is  the  linearization  of  the  numerical  flux  vector  with 
respect  to  the  two  solution  states,  e.g.  given  the  Roe 
flux  function  [16] 

h(u'',u^;ii)  =  1  (f(u'®,n)-l-f(u'',n))  (7) 

-  lKu'',u‘;n)|(u''-uq(8) 

we  require  the  Jacobian  terms  and  Here,  f 
is  the  flux  function,  n  a  geometric  normal,  A  = 
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the  physical  flux  Jacobian,  evaluated  at  some  combi¬ 
nation  of  the  right  and  left  states  of  the  flow  variables, 
u^,u^.  Exact  analytical  expressions  for  these  terms 
are  available  [1].  In  constructing  the  Jacobian  ma¬ 
trix  for  the  entire  scheme  it  is  useful  to  conceptualize 
the  finite-volume  scheme  in  composition  form: 

R(Q)  =  £i(£2{Q)),  (9) 

with  Cl  representing  the  flux  quadrature  and  accu¬ 
mulation  step  and  C2  representing  the  data  recon¬ 
struction  step.  In  this  form,  each  operator  requires 
distance-1  information.  The  Jacobian  matrix  can 
then  be  written  as 

dR  _  dCi  dC2  /jg., 

'^~dC'2~dQ  ^  ’ 

with  the  critical  observation  that  the  Jacobian  matrix 
can  be  calculated  as  the  sparse  product  of  two  ma¬ 
trices.  This  could  potentially  be  an  expensive  task, 
but  because  of  the  special  form  of  £1  and  £2,  the 
resulting  sparse  product  produces  at  most  distance-2 
fill  and  can  be  computed  at  reasonable  cost. 


4.3.1  Sparse  Matrix- Vector  Multiply 

The  most  straightforward  strategy  is  to  analytically 
compute  and  store  the  Jacobian  matrix  using  a  com¬ 
pressed  storage  scheme  designed  for  sparse  matrices. 
This  strategy  has  the  added  benefit  that  a  copy  of  the 
matrix  can  also  be  used  as  a  preconditioner  for  the 
iterative  solver.  In  addition,  the  explicit  storage  also 
permits  the  formation  of  the  transposed  matrix  prob¬ 
lem  which  is  often  encountered  in  optimization  pro¬ 
cedures  coupled  with  Newton’s  method.  Obviously, 
a  drawback  of  this  approach  is  the  large  storage  re¬ 
quirement. 

4.3.2  Approximate  Frechet  Derivatives 

An  alternative  to  analytically  calculating  Frechet 
derivatives  is  to  approximate  them  using  finite  differ¬ 
ences,  [12]  [8]  [10].  The  required  Frechet  derivative  is 
a  limiting  form  of  the  difference  approximation 

rfR  R(Q  +  ep)-R(Q) 

p  =  hm - ^ - . 


4.3  Exact  and  Approximate  Jacobian 
Matrix- Vector  Products 


Consider  the  standard  matrix  equation  b  —  Ax  =  0. 
Iterative  matrix  solution  algorithms  for  this  problem 
requires  the  computation  of  matrix-vector  products 
of  the  form  Ap  for  some  arbitrary  p  vector.  In  the 
approximate  Newton  algorithm 


At  ~  dQ 


(11) 


where  £)  is  a  positive  diagonal  matrix.  In  practice  the 
diagonal  entries  are  locally  scaled  as  a  exponential 
function  of  the  norm  of  the  residual 


A  ^  cfl,- 
At  cflfTiajr 


Cflma.  =  /(||R(Q)"||) 


so  that  when  ||R(Q)||  — +  0,  cfl^ax  00  ^•nd  the 
scheme  approaches  Newton’s  method.  It  should  be 
emphasized  that  by  using  this  strategy,  the  scheme 
is  technically  an  approximate  Newton  method  which 
becomes  exact  only  in  the  final  few  iterations  of  the 
computation. 

A  major  step  in  the  matrix-vector  product  Ap  is 
the  computation  of  Jacobian  derivatives  in  the  direc¬ 
tion  of  p  (a  Frechet  derivative) 


Ap  = 


dR 

dQ^' 


(12) 


Several  possible  strategies  exist  for  computing  the 
needed  Frechet  derivatives: 


The  primary  concern  with  this  approach  is  the  accu¬ 
racy  of  derivatives  and  the  optimal  choice  for  e.  If 
derivatives  are  not  computed  accurately  then  meth¬ 
ods  such  as  GMRES  iteration  may  stall  or  fail.  Using 
a  forward  difference  approximation,  e  must  be  care¬ 
fully  chosen.  In  general  it  is  insufficient  to  choose  e 
as  a  constant  such  as  the  square  root  of  machine  pre¬ 
cision.  Johan  [12]  also  mentions  this  fact  and  gives 
some  analysis  for  choosing  e  but  this  analysis  assumes 
that  R(Q)  is  well  scaled.  A  common  choice  for  e  is 
given  by 


e  =  ^0  4- 


M 

llpll 


(13) 


with  suitably  chosen  constants  (5o  and  (5i .  An  alter¬ 
native  to  forward  differencing  is  to  use  higher  order 
accurate  formula  such  as  central  differencing  at  dou¬ 
ble  the  computational  cost. 

The  clear  attraction  of  this  approach  is  the  low 
memory  requirement.  On  the  other  hand,  the  nu¬ 
merical  computation  of  Frechet  derivatives  does  not 
produce  a  matrix  approximation  which  can  be  used 
to  precondition  the  system. 


4.3.3  Exact  Product  Forms 

In  this  section  we  will  present  a  technique  for  con¬ 
structing  matrix-vector  products  which  is  an  exact 
calculation  of  the  Frechet  derivative.  Extension  to 
systems  and  the  inclusion  of  diffusion  terms  are  also 
handled  using  this  technique. 
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Let  G{E,  V)  denote  the  triangulation  in  2D  or  3D 
with  n  vertices  and  m  edges.  Next  we  define  the 
incidence  matrix 


{—1  if  Vi  is  the  origin  of  edge  / 

1  if  Vi  is  the  destination  of  edge  I  ■ 

0  otherwise 

(14) 

Let  h  =  h(u^,  u^;  n)  denote  the  numerical  flux  func¬ 
tion  as  defined  by  Equation  8.  For  a  system  of  /  cou¬ 
pled  differential  equations,  the  Jacobian  matrix  en¬ 
tries  are  actually  small  I  x  I  blocks.  For  ease  of  expo¬ 
sition,  we  tacitly  treat  these  small  blocks  as  scalar  en¬ 
tries.  Under  these  simplifications,  the  desired  matrix- 
vector  product  is  given  by 


’[  dh  ■ 

du^' 

■  dh  ■ 

'du^l' 

[du^ 

du 

-1- 

du^ 

du 

P  (15) 


where  [^]  G  with  nonzero  diagonal  elements, 

and  [^r]  ^  If  we  do  not  incorporate  mono¬ 

tonicity  enforcement  into  the  reconstruction  proce¬ 
dure  then  a  considerable  simplification  occurs  in  the 
calculation  of  matrix-vector  products.  The  main  idea 
is  given  in  the  following  almost  trivial  lemma. 

Lemma:  Let  v  =  'R.{U)  =  1l{ui,U2,  denote 

an  arbitrary  order  reconstruction  operator.  If  TZ  de¬ 
pends  linearly  on  u,-  then 


dv 

du 


p=n{p). 


Finally,  the  linearized  fluxes  are  assembled  using  the 
same  procedure  as  the  residual  vector  assembly.  In 
actual  calculations,  the  conservative  flow  variables 
are  not  reconstructed,  thereby  necessitating  that  a 
change  of  variable  transformation  be  embedded  in  the 
formulation.  This  is  not  a  serious  complication. 


4.4  Matrix  Preconditioning 

In  the  present  applications,  we  consider  a  precondi¬ 
tioning  matrix  based  on  the  incomplete  lower-upper 
(ILU)  factorization  of  the  matrix  A.  ILU  precondi¬ 
tioning  is  a  popular  and  robust  preconditioning  pro¬ 
cedure  for  use  in  iterative  matrix  solvers.  ILU  fac¬ 
torization  is  a  modification  to  the  standard  Gaussian 
elimination  for  which  the  nonzero  fill  pattern  is  ei¬ 
ther  preimposed  or  determined  dynamically  based  on 
the  size  or  location  of  fill  elements.  In  this  way  the 
amount  of  storage  required  can  be  specified  and  in 
some  instances  minimized.  Technical  aspects  of  ILU 
factorization  such  as  existence  and  spectral  properties 
have  been  proven  for  M-matrices,  but  the  general  ap¬ 
plicability  is  much  broader  and  well  documented  in 
the  literature.  The  triangular  solves  required  in  the 
application  of  ILU  preconditioning  generally  give  the 
method  global  support.  This  is  usually  considered  a 
favorable  characteristic  of  the  method. 

The  finite-volume  scheme  with  high  order  data  re¬ 
construction  suggests  two  possible  matrices  suitable 
for  incomplete  factorization. 


Proof;  Linearity  implies  that 

n 

v  =  Tl{ui,U2,...,Un)  =  OjUj 
i-l 


SO  that  ^  =  a»-  The  desired  result  follows  immedi¬ 
ately 


n 

=  '^aiPi  =  n{p). 

J=1 


This  lemma  suggests  the  following  procedure  for  cal¬ 
culation  of  matrix-vector  products,  from  Eq.  15. 


1.  Distance-1  matrix  preconditioning.  Construct 
the  preconditioning  matrix  from  the  Jacobian 
matrix  associated  with  the  lower  (first)  order  ac¬ 
curate  discretization  of  the  flow  equations.  This 
matrix  involves  distance-1  neighbors  in  the  trian¬ 
gulation.  Matrix- vector  products  are  computed 
“exactly”  using  the  Jacobian  matrix  associated 
with  the  full  second  order  accurate  scheme. 

2.  Distance-2  matrix  preconditioning.  Use  the  Ja¬ 
cobian  matrix  of  the  entire  second  order  accurate 
scheme  for  both  matrix-vector  products  and  pre¬ 
conditioning. 


dR 

dQ^ 


=  C^ 


dh 


du^ 


TIHp)  + 


dh 


d\i^ 


n^ip) 


(16) 


This  amounts  to  a  reconstruction  of  the  vectors 
and  p'^  from  p  using  the  same  reconstruction  op¬ 
erator  used  in  the  residual  computation.  Next,  the 
linearized  form  of  the  flux  function  is  computed: 


dh  I 


dh  f 


4.5  Performance  of  GMRES 

The  viscous  multi-element  test  problem  given  above 
provides  representative  matrices  for  evaluating  the 
GMRES  algorithm.  We  construct  approximate  New¬ 
ton  matrices  corresponding  to  flow  CFL  numbers  of 
10^  and  10®.  In  addition,  distance-1  and  distance-2 
preconditioning  matrices  are  used  to  accelerate  the 
algorithms. 
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Figures  11-12  graph  the  convergence  histories  for 
the  GMRES  algorithm  and  the  two  choices  of  pre¬ 
conditioner.  Since  the  matrix-vector  products  and 
preconditioning  solves  dominate  the  iterative  calcu¬ 
lation,  convergence  histories  are  plotted  against  the 
number  of  matrix- vector  products  required.  Each 
GMRES  iteration  requires  one  matrix-vector  prod¬ 
uct.  The  GMRES  algorithm  is  clearly  adversely  af¬ 
fected  by  the  distance- 1  preconditioning.  For  this 
case  the  distance- 1  preconditioned  system  requires 
roughly  twice  as  many  iterations  as  the  distance-2 
preconditioned  system.  In  fact  for  CFL  =  10®,  the 
convergence  is  unacceptably  slow.  In  general  we  find 
that  when  using  the  distance- 1  preconditioning  ma¬ 
trix,  an  optimal  CFL  number  exists  for  convergence 
and  efficiency,  which  is  large  but  not  infinite. 


Figure  11:  Viscous  Flow  matrix  solution  conver¬ 
gence  histories  for  the  GMRES{SQ)  algorithm  at 
CFL  =  10®  using  ILU(O)  distance-1  and  distance-2 
preconditioning  matrices. 


5  Summary 

Practical  aspects  for  Newton-GMRES  algorithms 
from  working  Navier-Stokes  codes  have  been  pre¬ 
sented.  In  particular,  implementation  issues,  such 
as  memory  requirements,  accuracy  requirements  for 
Ap  products,  tradeoffs  between  full  Newton  and  re¬ 
lax  Newton  and  other  pertinent  approximations,  have 
been  discussed.  Two  approaches  have  been  high¬ 
lighted.  In  the  incompressible  Navier-Stokes  code. 


Figure  12:  Viscous  Flow  matrix  solution  conver¬ 
gence  histories  for  the  CM RES{Z0)  algorithm  at 
CFL  —  10®  using  ILU(O)  distance-1  and  distance-2 
preconditioning  matrices. 


the  best  strategy  appears  to  be  an  inexact  Jacobian 
(a  first  order  accurate  approximation  to  the  third 
order  accurate  Function  Evaluation  )  for  the  Ap 
products,  a  consistent  ILU(O)  preconditioner,  a  small 
subspace  size  and  fairly  loose  tolerances  for  Inner 
Iteration  convergence.  In  the  unstructured  mesh 
approach,  exact  Ap  products  are  successfully  cou¬ 
pled  with  a  first  order  approximate  ILU(O)  precondi¬ 
tioner  and  tighter  tolerances  levels  for  Inner  Itera¬ 
tion  convergence.  In  both  cases,  an  optimal  strategy 
is  found  producing  enhanced  efficiencies.  Although 
these  conclusions  are  not  universal,  they  do  provide 
guidelines  and  practical  suggestions  for  general  im¬ 
plementations. 
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SUMMARY 

This  paper  discusses  a  new  numerical  method  which  enables  the 
future  application  of  Large  Eddy  Simulation  to  high  Reynolds 
number  aerodynamic  flows.  The  new  numerical  method  uses 
local  grid  refinement  of  hexahedral  cells  and  the  discontinuous 
Galerkin  finite  element  method.  This  method  offers  maximum 
flexibility  in  grid  adaptation  and  maintains  accuracy  on  highly 
irregular  grids .  The  method  is  demonstrated  with  calculations  of 
in  viscid  transonic  flow  on  a  generic  delta  wing .  The  calculations 
are  done  on  two  parallel  shared  memory  computers  and  the 
performance  results  are  used  to  give  estimates  of  the  computing 
time  and  memory  requirements  for  a  Large  Eddy  Simulation  of 
a  clean  wing  on  a  NEC  SX-4  supercomputer. 


LIST  OF  SYMBOLS 


bx 

B 

c‘[o,r] 

Sij 

E 

eic 

FTU) 


external  boundary  face  of  element  K 

boundary  operator 

space  of  one  time  differentiable 

functions  on  the  interval  [0,  T] 

Kronecker  delta  symbol 

specific  total  energy 

face  of  polyhedron  K 

flux  vector  in  Cartesian  coordinate 

direction  j 


F(U)  inner  product  of  and  E 

T (U)  matrix  with  columns  F-' 

Fk  mapping  between  elements  A'  and  K 

7  ratio  of  specific  heats 

UaFa  path  in  phase  space  between 

andU^"*^^'') 
monotone  Lipschitz  flux 


K 

IC 

R 

meas(A') 

dK 

M 

[Mk] 

fV+ 

N{K) 

N^{K) 

n 

n 

dn 

p 

p\R) 


polyhedron  element  in  Th 

neighboring  elements  of  polyhedron  K 

master  element  of  polyhedron  K 

measure  of  polyhedron  K 

boundary  of  polyhedron  K 

maximum  number  of  polynomial  terms 

in  expansion  of  U/, 

mass  matrix  of  element  K 

set  of  positive  natural  numbers 

set  of  neighboring  elements  of  K 

indices  of  neighboring  elements  of  K 

in  the  ^-direction 

unit  outward  normal  vector 

flow  domain 

boundary  of  Q 

pressure 

space  of  polynomial  functions  of 
degree  <k  on  R 

space  of  functions  whose  images  under 
Fk  are  functions  in  P^(R) 


^'k 

C) 

0 

Ra' 


P 

span 

t 

T 

Th 

U 

tJjc 

U„ 

Uo 

V\k 

Vh 

-^JextiK) 


Vh 

Ujc 

Ui 

V 

v^(/0 

Wh 

X 

Xj 

xk' 

Aa 


c 

V 

V 

c 

6 


T 


limiter  function  defined  on  K 
components  of  limiter  function  on  K 
polynomial  basis  functions  on  R 
basis  functions  on  K 
trilinear  element  shape  functions 
residual  in  element  K 
indicator  functions  for  grid  adaptation  in 
T]  and  (  directions 
Euclidian  n-dimensional  space 
density 
linear  span 
time 

final  time 

triangulation  of  O 

conservative  flow  variables 

average  of  U  in  element  K 

conservative  flow  variables  specified  at  dQ 

initial  conservative  flow  variables 

U  restricted  to  element  K 

numerical  approximation  of  U 

U  at  ceU  face  taken  as  the  limit  from  the 

exterior  of  K 

U  at  cell  face  taken  as  the  limit  from  the 
interior  of  K 

maximum  U  in  /F  and  it’s  neighboring  cells 
minimum  U  in  /F  and  it’s  neighboring  cells 
components  of  U*  at  Gauss  quadrature  points 
in  cell  faces  of  R 

components  of  polynomial  expansion  of  U  in  /tT 

limited  components  of  polynomial  expansion 

coefficients  in  /F 

limited  flow  field  U  in  each  element 

vector  with  limited  moments  of  flow  field  Um 

Cartesian  velocity  components 

primitive  flow  variables 

vectors  with  each  component  Pi  £  P'‘{K) 

vectors  which  belong  to  space 

position  vector 

components  of  position  vector,  j  =  { 1,  2, 3} 
coordinates  of  comer  points  of  element  K 
length  of  cell  in  local  ^-direction 
local  coordinates  in  element  R 
for  all 

nabla  operator 

subset 

element 

composite  mapping 
tensor  product 
transposed 


Paper  presented  at  the  AGARD  FDP  Symposium  on  "Progress  and  Challenges  in  CFD  Methods  and  Algorithms” 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 


22-2 


INTRODUCTION 

Computational  Fluid  Dynamics  (CFD)  is  used  for  increasingly 
complicated  problems.  Many  advanced  applications  of  CFD, 
such  as  Large  Eddy  Simulation  (LES),  can  only  be  done  with 
sophisticated  grid  adaptation  algorithms  and  require  significant 
computer  resources.  The  aim  of  this  paper  is  to  demonstrate  a 
new  grid  adaptation  algorithm  for  future  application  to  Large 
Eddy  Simulation.  With  LES  the  filtered  Navier-Stokes  equa¬ 
tions  are  solved  which  represent  the  part  of  the  turbulent  flow 
field  that  can  be  resolved  on  the  grid.  The  turbulent  length  scales 
which  can  not  be  resolved  have  to  be  modeled  with  subgrid  scale 
turbulence  models.  This  approach  is  quite  successful  in  most 
parts  of  the  flow  field,  but  as  already  mentioned  by  Chapman 
[3],  fails  in  the  near  wall  region  which  is  critical  for  LES.  Chap¬ 
man  proposed  to  use  successively  finer  grids  close  to  the  wall 
to  capture  the  viscous  sublayer.  This  reduces  the  need  to  model 
the  near  wall  region  where  the  basic  assumption  of  LES,  namely 
the  separation  of  the  flow  field  in  large  and  small  scales,  is  not 
valid. 

Despite  the  significant  progress  made  in  LES  since  Chapman’s 
paper  the  proper  solution  of  the  near  wall  flow  field  is  still  one 
of  the  key  elements  preventing  LES  to  be  applied  to  more  gen¬ 
eral  problems  in  aerospace,  Moin  and  Jimenez  [10].  The  use 
of  successively  finer  grids  can  only  be  done  efficiently  with  so¬ 
phisticated  grid  adaptation  techniques  and  requires  a  numerical 
scheme  which  is  accurate  on  highly  irregular  grids.  In  this  paper 
a  new  algorithm  is  presented,  using  a  combination  of  local  grid 
refinement  and  the  discontinuous  Galerkin  (DG)  finite  element 
method.  This  method  is  capable  of  efficiently  resolving  local 
phenomena  such  as  shear  layers  and  shocks  and  has  the  potential 
to  be  applied  to  LES  of  wall  bounded  turbulent  flows  by  properly 
resolving  the  near  wall  region.  Hexahedron  cells  are  used  as  ba¬ 
sic  elements  because  they  suffer  less  from  loss  of  accuracy  due 
to  successive  refinements  than  the  more  commonly  used  tetra¬ 
hedron  cells  and  are  more  suited  to  viscous  flows.  This  paper, 
however,  will  be  limited  to  inviscid  flow  in  order  to  demonstrate 
the  basic  algorithm. 

The  discontinuous  Galerkin  method  with  Runge-Kutta  time  in¬ 
tegration  (RKDG)  was  originally  proposed  by  Cockbum  and 
Shu  [4,  6,  5]  for  hyperbolic  conservation  laws.  They  proved 
that  the  RKDG  method  is  TVB  stable  and  satisfies  a  maximum 
principle  for  multi  dimensional  scalar  hyperbolic  conservation 
laws.  This  work  was  mainly  theoretical  and  limited  to  one  and 
two-dimensional  flow  fields.  The  extension  to  three  dimensions 
was  recently  presented  by  van  der  Vegt  [14].  The  discontinuous 
Galerkin  method  uses  a  local  polynomial  expansion  in  each  cell 
which  results  in  a  discontinuity  at  each  cell  face.  This  disconti¬ 
nuity  can  be  represented  as  a  Riemann  problem  which  provides  a 
natural  way  to  introduce  upwinding  into  a  finite  element  method. 
The  DG  method  can  therefore  be  considered  as  a  mixture  of  an 
upwind  finite  volume  method  and  a  finite  element  method. 

A  key  feature  of  the  DG  method  is  that  also  equations  for  the 
moments  of  the  flow  field  are  solved.  In  this  way  a  completely 
local  higher  order  accurate  spatial  discretization  can  be  obtained 
without  the  need  to  use  neighboring  cells  in  the  discretization. 
An  alternative  to  obtain  the  flow  field  gradients  is  to  use  Gauss’ 
identity,  but  this  method  requires  grid  regularity  to  be  accurate. 
The  use  of  the  moment  equations  is  extremely  useful  in  com¬ 


bination  with  local  grid  refinement  because  no  problems  with 
hanging  nodes  occur  and  the  scheme  maintains  it’s  accuracy  on 
highly  irregular  grids,  which  generally  occur  after  several  grid 
refinement  steps.  In  this  paper  the  spatial  accuracy  is  limited 
to  second  order  and  the  moments  represent  the  flow  field  gra¬ 
dients.  A  disadvantage  of  using  the  moment  equations  is  that 
more  memory  is  needed  to  store  the  additional  moments  of  the 
flow  field.  For  future  LES  applications  in  wall  bounded  flows 
these  disadvantages  are,  however,  more  than  compensated  by 
the  increased  computational  efficiency  of  the  adapted  grid. 

The  DG  method  makes  it  easy  to  mix  different  types  of  ele¬ 
ments.  As  basic  elements  hexahedrons  are  used,  but  whenever 
necessary  due  to  topological  degeneracies,  prisms,  tetrahedrons 
and  other  degenerated  hexahedrons  are  used.  The  initial  coarse 
grid  is  obtained  from  a  multi-block  structured  grid,  generated 
with  the  NLR  ENFLOW  system.  This  grid  is  transformed  into 
an  unstructured  grid  using  a  face-based  data  structure,  van  der 
Vegt  [14].  This  data  structure  is  more  suited  to  anisotropic  local 
grid  refinement  than  the  commonly  used  octree  data  structure. 
Anisotropic  grid  refinement  is  important  because  many  flow 
phenomena  are  locally  pseudo  two-dimensional,  eg.  shocks  and 
shear  layers,  and  can  not  be  efficiently  captured  with  isotropic 
grid  refinement. 

The  DG  method  combined  with  the  face  based  data  structure 
is  extremely  local  in  nature  and  makes  it  a  good  candidate  for 
parallel  computing.  Parallel  computers  offer  the  possibility  to 
overcome  the  physical  limits  on  single  processor  speed,  but 
require  a  significant  effort  to  optimize  numerical  schemes  and 
coding.  LES  requires  significant  computer  resources  and  the 
performance  of  the  DG  method  on  two  different  types  of  parallel 
shared  memory  computers,  namely  a  two  processor  NEC  SX-3 
and  a  four  processor  SGI  Power  Challenge,  will  be  discussed  in 
this  paper.  The  choice  for  parallel  shared  memory  computers  is 
made  initially  to  limit  the  effort  in  modifying  codes. 

The  outline  of  the  paper  is  as  follows.  After  a  brief  description 
of  the  governing  equations,  the  DG  method  wUl  be  discussed 
followed  by  a  description  of  the  grid  adaptation  algorithm.  The 
algorithm  will  be  demonstrated  on  the  flow  field  around  a  generic 
delta  wing.  Next,  several  aspects  of  using  parallel  shared  mem¬ 
ory  computers  will  be  discussed  and  performance  results  will 
be  presented.  These  data  will  be  used  to  give  an  estimate  of  the 
computational  complexity  of  a  LES  of  a  clean  wing.  The  papers 
finishes  with  concluding  remarks. 

GOVERNING  EQUATIONS 

The  Euler  equations  for  inviscid  gas  dynamics  in  conservation 
form  can  be  expressed  in  the  flow  domain  Q  as: 

|u(x,.)  +  Ar 

Here  x  and  t  represent  the  coordinate  vector,  with  com¬ 
ponents  xi,i  =  {1,2,3},  in  the  Cartesian  directions,  and 
time,  respectively.  The  Euler  equations  are  supplemented  with 
initial  condition  U(x,  0)  =  Uo(x)  and  boundary  condition 
U(x,  t)|an  =  fi(U,  Utu);  where  B  denotes  the  boundary  op¬ 
erator  and  the  prescribed  boundary  data.  The  vectors  with 
conserved  flow  variables  U  and  fluxes  F^,  j  =  {1, 2, 3},  are 
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defined  as: 

(  P  \  /  pn,  \ 

U  —  I  pu,  1  ;  F-’  =  I  puiUj  +p5,j  1  , 

\  pE  J  \  Uj{pE  +  p)  j 

where  p,  p  and  E  denote  the  density,  pressure  and  specific 
total  energy  and  ui  the  velocity  in  the  Cartesian  coordinate 
directions  Xi,  i  =  {1, 2, 3}  and  Sij  the  Kronecker  delta  symbol. 
The  summation  convention  is  used  on  repeated  indices.  This 
set  of  equations  is  completed  with  the  equation  of  state:  p  = 
(7  —  l)p{E  —  iuiu,),  with  7  the  ratio  of  specific  heats. 

DISCONTINUOUS  GALERKIN  APPROXIMATION 

The  flow  domain  Q,  which  is  assumed  to  be  a  polyhedron,  is 
covered  with  a  triangulation  Th  —  {A'}  of  hexahedrons,  which 
are  related  to  the  master  element  through  the  mapping  Fk- 

g 

Fk  :  x(e,r?,C)  =  r/,  C) 

1=1 

with  C)  standard  linear  finite  element  shape  func¬ 

tions  and  X  V  the  coordinates  of  the  vertices  of  the  hexahedron 
K. 

Define  on  the  master  element  R  =  [—  1 , 1]^  the  space  of  poly¬ 
nomials:  —  span{<^j(^,  r),  Q,j  =  0,  •  •  • ,  M)  and  the 

related  space  P'‘{K)  as  the  space  of  functions  whose  images 
under  Aa'  are  functions  in  P'‘{R):  P'^{K)  =  span{(^j(x)  = 
o  F^^,j  —  O,"  ■ ,  M}.  In  this  paper /c  =  1,  which  yields 
a  second  order  accurate  spatial  discretization  with  polynomials 
$  €  withM  =  3. 


+  /  VWnx)A(Uft)dn,  (2) 

JK 

with  T  =  j  =  {1,2,3},  and  ck  C  dK\dQ.  and 
bjc  C  dK  n  dO.  the  faces  of  element  K  in  the  interior  and 
at  the  boundary  of  the  domain  O,  respectively.  The  vector 
represents  the  transposed  unit  outward  normal  vector  at  dK. 

The  flux  at  the  faces  e/r,  namely  n^A(U)  =  F(U),  is  not 
clearly  defined,  because  the  flow  field  Ua  is  discontinuous  at 
the  cell  faces.  The  flux  is  therefore  replaced  with  a  mono¬ 
tone  flux  function  which  is  consistent, 

h(U,U)  =  f'(U).  Here  and  denote  the 

value  of  U  at  dK  taken  as  the  limit  from  the  interior  and  ex¬ 
terior  of  K.  More  details  can  be  found  in  Cockbum  et  al.  [5]. 
The  use  of  the  monotone  Lipschitz  flux  h  introduces  upwinding 
into  the  Galerkin  method  by  solving  the  (approximate)  Rie- 
mann  problem  given  by  Suitable  fluxes 

are  those  from  Godunov,  Roe,  Lax-Friedrichs  and  Osher.  In 
this  paper  the  Osher  approximate  Riemann  solver  [1 1]  is  used, 
because  of  it’s  good  shock  capturing  capabilities,  and  the  pos¬ 
sibility  to  easily  modify  the  Riemann  problem  to  account  for 
boundary  conditions.  An  important  additional  reason  for  the 
use  of  the  Osher  scheme  is  that  it  gives  an  exact  solution  for 
a  steady  contact  discontinuity,  and  therefore  it  has  a  very  low 
numerical  dissipation  in  boundary  layers,  [13],  which  is  impor¬ 
tant  for  future  extension  of  the  algorithm  to  the  Navier-Stokes 
equations.  The  Osher  approximate  Riemann  solver  is  defined 
as: 


Define  V}(A')  =  {P(A')  R^\pi  £  P'(A')},  then 
U(x,  t)  |a'  can  be  approximated  by  U/,(x,  t)  6  V},(Ar)  0 
C' [0,7^  as: 

3 

Ua(x,  0  =  Um(0'?!>m(x).  (1) 

m=0 

The  expansion  of  U  is  local  in  each  element  and  there  is  no  con¬ 
tinuity  across  element  boundaries,  which  is  a  major  difference 
with  node  based  Galerkin  finite  element  methods.  The  element 
based  expansion  has  as  important  benefit  that  hanging  nodes, 
which  frequently  appear  after  local  grid  refinement,  do  not  give 
any  complications.  Degenerated  hexahedrons,  such  as  prisms 
and  tetrahedrons,  which  are  necessary  to  deal  with  topological 
degeneracies  in  the  grid,  are  allowed  without  further  complica¬ 
tions  because  the  degenerated  surfaces  do  not  contribute  to  the 
flux  balance. 

The  discontinuous  Galerkin  finite  element  formulation  of  the 
Euler  equations  is  given  by: 

Find  Uh  £  Vl(K)  0  C‘[0, 7],  such  that  Ua(x,0)  = 
Uo(x)|a-  e  V}(/0,  andforVW,,  €  V}(A'): 


_a 

dt 


L 


Wh(x)UA(x,  i]dD.= 


Wfe(x)  (n'^(x)A(UA))  dS 
Wr(x)  (n^(x)A(B(Ua,  U,.)))  dS 


1  (f,(U;n.(A-))  ^ 


ext{K) 

h 


)- 


where  UcTa  is  a  path  in  phase  space  between  and 

■jj  ext(  A )  of  the  calculation  of  this  path  integral  in  multi¬ 

dimensions  can  be  found  in  [11].  At  the  boundary  surface  the 
path  Fa  must  be  modified  to  account  for  boundary  conditions. 
In  this  way  a  Riemann  initial-boundary  value  problem  is  solved 
instead  of  an  initial  value  problem,  [11],  and  a  completely  unified 
and  consistenttreatment  of  the  flux  calculations  is  obtained,  both 
at  interior  and  exterior  faces. 


The  first  order  accurate  discontinuous  Galerkin  method  with  an 
(approximate)  Riemann  solver  yields  monotone  results,  but  sec¬ 
ond  and  higher  order  discretizations  need  a  slope  limiter  to  pre¬ 
vent  numerical  oscillations  around  discontinuities  and  in  regions 
with  steep  gradients.  Cockbum  et  al.  [5]  derived  a  local  pro¬ 
jection  limiter  on  B-triangulations  for  multi-dimensional  scalar 
conservation  laws,  which  gives  a  second  order  accurate  scheme 
and  satisfies  a  maximum  principle  when  combined  with  a  TVD 
Runge-Kutta  time  integration  method  [12].  The  extension  to 
quadrilaterals  is  presented  by  Bey  and  Oden  [2],  but  turned  out 
to  be  very  dissipative. 

In  this  paper  a  different  approach  is  followed.  The  second  order 
discontinuous  Galerkin  method  strongly  resembles  a  MUSCL 
upwind  scheme,  with  as  main  difference  the  procedure  to  de¬ 
termine  the  flow  gradient.  In  the  DG-method  the  gradient 
is  determined  by  solving  equations  for  the  moments  U^, 
m  =  (1,  2,  3),  whereas  the  MUSCL  scheme  determines  the 
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gradient  using  data  from  surrounding  cells.  The  same  limit¬ 
ing  procedure  can,  however,  be  followed.  In  this  paper  the 
multi-dimensional  limiter  from  Barth  and  Jesperson  [1],  with 
the  modifications  proposed  by  Venkatakrishnan  [15],  is  used. 
The  limiter  from  Barth  and  Jespersen  has  as  benefit  that  it  is  a 
truly  multi-dimensional  limiter  and  yields  a  positive  scheme. 

The  limiter  ftx)m  Barth  and  Jespersen  can,  however,  seriously 
degrade  convergence  to  steady  state.  This  was  analysed  by 
Venkatakrishnan  [15]  and  the  two  main  causes  for  this  phe¬ 
nomenon  are  the  non-smoothness  of  the  limiter,  which  uses 
min-  and  max-functions,  and  the  fact  that  the  limiter  is  active  in 
smooth  parts  of  the  flow,  eg.  in  the  far  field. 

The  limiter  according  to  Venkatakrishnan  [15]  is  directly  applied 
to  the  conservative  variables,  which  saves  the  considerable  ex¬ 
pense  of  computing  the  local  characteristic  decomposition. 

Define  for  each  component  U’^  of  the  cell  average  XJk  — 

Ukw^  -  max  {Uk,Uki), 

VA''6iV(iC) 


with  N{K)  the  set  of  neighboring  cells  which  connect  to  cell 
K.  In  order  to  maintain  monotonicity  the  approximate  flow  field 
Uh  must  satisfy  Uh(x)  e  Vx  e  K,  which  is 

accomplished  with  the  limiter  function  defined  as; 


'  4>l 

^  ^'k*-^'k  j 

4>l 

1 

* 

1 

if  Uk.  -  Uk  >  0 
if  Uk.  -  Uk  <  0 
if  Uk.  -  Uk  =  0 


Here  Uk.  are  the  components  of  at  the  Gauss  quadrature 
points  in  used  to  evaluate  the  integrals  in  equation  (2).  The 
function  (j>L{y)  replaces  min(l,y)  in  the  original  Barth  and 
Jesperson  limiter  and  is  defined  as: 

+  2y 


<t>L{y) 


y 


y'^  +  y  +  l 
Defining  A  =  Uk*  -  Uk,  -  Ukm^ 
min  -  and  replacing  A  ±  with  A  ±  -t- 

is  obtained: 


—  Uk  and  A-  = 
a  smoother  limiter 


^K=< 


Ai-t€V-t2AA_ 


if  A  >  0 


if  A  <  0 


if  A  =  0 


The  coefficient  e A'  is  set  equal  to  ex  =  (CAsx)^,  with  Ask 
the  minimum  distance  between  the  cell  face  centers  of  two  op¬ 
posite  faces  of  element  K.  The  constant  C  determines  the 
balance  between  limiting  and  no  limiting  and  thereby  influences 
the  convergence  to  steady  state.  If  C  =  0  the  original  Barth  and 
Jespersen  limiter  is  obtained.  In  this  paper  C  =  1  is  used. 


The  limiter  4>x  is  applied  independently  to  each  component  of 
the  flow  field;  Uki  —  ^kU'm,  m  =  {1, 2,3}.  This  is  slightly 
less  robust  then  using  =  min;  il>x,  but  gives  significantly 
less  numerical  dissipation.  The  coefficients  Clm,  m  =  {li  2, 3} 
in  equation  (1)  represent  the  gradient  of  the  flow  field  with 
respect  to  the  local  coordinates  in  .  This  modification  of  the 


local  gradient  would  violate  conservation  of  U  in  A',  which  can 
be  corrected  by  modifying  the  coefficient  tlo: 

3 


t/o  =  + 


i-«i> 


meas(A') 


i(x)dQ 


This  relation  is  obtained  from  the  condition 
m^(K)  Ik  U/»(x)dQ  =  Ux.  The  limited  flow  field  in  cell 
K  is  then  equal  to: 


3 

U/.(x,  f)  =  ^  U,„(f)<^,„(x). 

m=0 


The  final  discontinuous  Galerkin  finite  element  discretization 
is  now  obtained  by  evaluating  the  integrals  over  the  element 
K  and  it’s  boundary  dK  in  equation  (2).  This  is  done  using 
the  transformation  Fx,  between  AT  and  the  master  element  i^. 
The  integrals  f  W^U hdCl,  are  calculated  analytically,  which 
requires  quite  some  algebra,  whereas  the  other  integrals  are 
calculated  with  Gauss  quadrature  rules.  Cockbura  et  al.  [5] 
proved  that  if  the  quadrature  rules  for  the  surface  integrals  in 
equation  (2)  are  exact  for  polynomials  of  degree  (2A:  -f  1)  and 
exact  for  polynomials  of  degree  2k  for  the  volume  integrals 
then  the  spatial  accuracy  of  the  DG  method  is  A:  -|-  1.  In  order 
to  preserve  uniform  flow  it  is  necessary  to  use  quadrature  rules 
which  are  exact  for  polynomials  of  order  3.  For  fc  =  1  the 
surface  integrals  are  calculated  with  four  point  G  auss  quadrature 
rules.  The  volume  integrals  require  six  point  Gauss  quadrature 
rules. 

The  use  of  four  and  six  point  Gauss  quadrature  rules  is,  however, 
unnecessarily  expensive.  The  number  of  flux  calculations  in  the 
approximation  of  the  surface  integrals  can  be  reduced  from  four 
to  one  using  the  following  approximation,  which  is  second  order 
accurate  in  the  mean: 

f  </.„(x)n'^F(U)dQ  -  /  .^„(x)n^F(U)JedO 

Jdci  J  an 

=  F(U)|c  [  JedCl 

JaCi 

with  F(U)|c  calculated  at  the  cell  face  center  and  Je  the  Ja¬ 
cobian  of  the  transformation  of  the  cell  face  dQ  to  dCi  on  the 
master  element  it'.  The  integrals  f^^(^„(x)n^Jed^  lire  pre¬ 
calculated  with  four  point  Gauss  quadrature  rules,  which  are 
exact  using  elements  defined  with  linear  shape  functions,  and 
therefore  free  stream  consistency  is  preserved  with  this  approx¬ 
imation.  A  similar  approximation  can  be  made  for  the  volume 
integral  VW^ (x)F(Ua)dfI,  with  F(U)  calculated  in  the 
center  of  if  and  the  geometrical  part  of  the  volume  integral  pre¬ 
calculated  with  a  six  point  Gauss  quadrature  rule.  This  formula¬ 
tion  requires  about  four  times  less  computing  time  than  using  the 
more  accurate  evaluation  of  the  flux  integrals  and  yields  similar 
results.  The  discretization  using  four  and  six  Gauss  quadrature 
points  for  the  surface  and  volume  integrals  yields,  however,  a 
slightly  more  robust  scheme  on  coarse  grids.  This  is  mainly  due 
to  the  fact  that  the  cross-coupling  terms  in  the  moment  equations 
are  retained  in  this  case. 

For  each  element  K  a  system  of  ordinary  differential  equations 
is  now  obtained: 

[Mx]  ^Ux  =  Ka' 
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with  Uif  a  vector  with  the  moments  of  the  flow  field  in  each 
element,  Um,  m  =  {0,  •  •  ■ ,  3},  and  Rk  the  right-hand  side  of 
equation  (2).  The  equations  for  are  integrated  in  time 

using  the  third  order  TVD  Runge-Kutta  scheme  from  Shu  [12]. 
For  steady  state  calculations  convergence  is  accelerated  using 
local  time  stepping. 

A  significant  difference  with  node  based  FEM  is  that  the  mass 
matrix  [Mk]  is  uncoupled  for  each  element  K  and  can  be  easily 
inverted. 


DIRECTIONAL  GRID  ADAPTATION 


The  use  of  increasingly  finer  grids  in  LES  in  the  near  wall  re¬ 
gion,  as  proposed  by  Chapman  [3],  and  in  other  regions  with 
strong  shear  layers  or  shocks  can  be  most  efficiently  done  using 
local  grid  refinement.  The  grid  is  locally  enriched  by  subdi¬ 
viding  cells,  independently  in  each  of  the  three  local  grid  di¬ 
rections,  r/  or  Cl  of  i^.  This  anisotropic  grid  refinement  is 
more  efficient  in  capturing  local  flow  phenomena  than  isotropic 
refinement,  because  many  flow  features  are  frequently  pseudo 
two-dimensional.  A  coarse  initial  grid  is  used,  which  is  gener¬ 
ated  with  a  multi-block  structured  grid  generator,  and  transferred 
into  an  unstructured  hexahedron  grid.  If  necessary  degenerated 
hexahedrons,  such  as  prisms  and  tetrahedrons,  are  allowed  to 
deal  with  topological  degeneracies.  After  calculating  the  flow 
field,  the  grid  cells  are  split  in  the  local  ^-direction  if: 

- — - j-  >  tolerance 

maxvjfgTh 

with  the  sensor  function  for  the  cell  K  defined  as: 


max  {Vl,-Vi,,fAeK  0) 

<6{lr-,5}.VA''6Ar«(A') 


Here  A(k  is  the  length  of  cell  K  in  the  local  ^-direction, 
V  =  (p,  u,  V,  w,p)^  the  vector  with  primitive  variables  and 
N^{K)  the  indices  of  the  neighboring  cells  of  cell  K  in  the 
^-direction.  Equivalent  expressions  are  used  for  the  rj  and  C 
directions.  This  sensor  is  based  on  an  equidistribution  principle, 
see  for  instance  Merchant  et  al.  [9].  An  important  advantage 
of  this  sensor  is  that  it  prevents  regions  with  discontinuities 
from  constantly  dominating  the  local  grid  refinement.  After 
several  refinements  the  relative  contribution  of  regions  with 
discontinuities  reduces,  because  A^k  in  equation  3  becomes 
progressively  smaller. 


DATA  STRUCTURE 

The  discontinuous  Galerkin  method  with  local  grid  refinement 
of  hexahedrons  requires  a  significantly  different  data  structure 
than  the  frequently  used  edge  based  data  structure.  The  edge 
based  data  structure  is  very  efficient  for  unstructured  vertex 
based  schemes  using  tetrahedrons.  The  discontinuous  Galerkin 
method  is  a  cell  based  algorithm  and  the  primary  calculations 
are  the  evaluation  of  fluxes  through  cell  faces.  This  can  be  done 
efficiently  using  a  face  based  data  structure.  A  face  based  data 
structure  also  has  as  important  benefit  that  there  are  no  limita¬ 
tions  on  the  number  of  cells  which  can  connect  to  one  cell  face 
and  is  crucial  for  local  grid  refinement.  The  alternative  would  be 
an  octree  data  structure,  but  this  data  structure  does  not  combine 
well  with  anisotropic  grid  refinement.  In  van  der  Vegt  [14]  an 
algorithm  is  presented  to  determine  all  face  to  cell  connections 
efficiently.  The  main  element  in  this  algorithm  is  that  cell  faces 
are  split  into  smaller  subfaces  until  each  face  connects  only  to 


Adaptation  Step 

Cells 

Grid  Points 

Faces 

0 

19152 

20790 

59594 

1 

33094 

38277 

132038 

2 

49088 

63357 

203400 

3 

73091 

104435 

307783 

4 

124030 

197424 

538109 

5 

211578 

357752 

933616 

6 

322708 

592441 

1447763 

Table  1:  Number  of  cells,  grid  points  and  faces  after  each 
adaptation  step 


one  cell  on  each  side.  There  are  no  limits  on  the  number  of 
neighboring  cells  and  using  advanced  searching  algorithms  a 
very  efficient  scheme  is  obtained,  which  can  establish  all  face  to 
cell  connections  in  0{Nlog2{N))  operations  with  N  the  num¬ 
ber  of  faces.  The  fluxes  are  calculated  in  one  loop  over  all  the 
faces,  which  can  be  fully  vectorized  using  a  coloring  scheme. 
The  face  based  data  structure  does  not  put  any  limitations  on 
the  number  of  neighboring  cells,  but  if  the  number  of  cells  con¬ 
necting  to  one  face  becomes  too  large  then  the  number  of  colors 
significantly  increases.  This  reduces  the  efficiency  on  vector 
and  parallel  computers  and  will  be  a  topic  of  future  research.  In 
the  grid  adaptation  process  cells  are  added  and  deleted  which  is 
done  efficiently  using  AVL-trees,  for  more  details  see  van  der 
Vegt  [14] 

DISCUSSION  AND  RESULTS 

The  grid  adaptation  algorithm  has  been  tested  on  the  flow  around 
a  generic  delta  wing.  The  geometry  is  a  cropped-delta  wing  with 
a  65-degree  sweep  angle  and  a  sharp  leading  edge.  A  constant 
airfoil  section  in  the  streamwise  direction  is  used  (modified 
NACA  64A005  profile;  straight  line  aft  of  75%  chord)  with  5% 
relative  thickness,  no  twist  and  camber.  More  information  about 
the  geometry  and  experimental  results  can  be  found  in  Elsenaar 
et  al.  [7].  A  transonic  flow  test  case  is  used  with  angle  of  attack 
a  =  20°  and  free  stream  Mach  number  Moo  =0.85.  The  initial 
grid  consisted  of  19152  cells  and  20790  grid  points.  The  grid  is 
adapted  six  times,  independently  in  aU  three  directions  and  the 
final  grid  consists  of  322708  cells  and  592441  grid  points,  see 
Table  1.  During  each  adaptation  step  approximately  15  %  of 
the  cells  is  deleted,  after  which  the  number  of  cells  is  increased 
between  70  %  and  90%.  The  removal  of  grid  cells  is  important, 
because  initially  on  the  coarse  grid  the  refinement  sensor  is  less 
accurate  and  some  unnecessary  refinement  takes  place.  Local 
time  stepping  is  used  and  significantly  improves  convergence  to 
steady  state,  see  Figure  1 .  The  sharp  peaks  in  the  convergence 
plot  are  caused  by  the  grid  adaptation,  except  for  the  first  peak, 
which  results  from  freezing  the  slope  limiter  after  750  time 
steps  to  improve  converge.  Freezing  of  the  slope  limiter  is  not 
necessary  after  grid  adaptation. 

Figure  3  shows  the  pressure  field  and  grid  lines  on  the  leeward 
side  of  the  delta  wing.  The  flow  field  is  dominated  by  a  strong 
primary  vortex  which  starts  at  the  apex  and  moves  downstream 
under  an  angle  of  20  degrees  with  the  streamwise  direction. 
Vorticity  is  generated  at  the  sharp  leading  edge  in  a  thin  vortex 
sheet  and  roUs-up  into  the  primary  vortex.  The  velocity  under 
this  vortex,  just  above  the  upper  surface,  becomes  very  large 
and  a  strong  shock  develops  between  the  primary  vortex  and 
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Figure  1:  Maximum  residual  in  flow  fleld. 

the  upper  surface,  see  also  Figures  4  and  6.  The  benefits  of 
anisotropic  grid  refinement  are  very  clear  in  Figure  3,  where  the 
grid  is  strongly  adapted  along  the  primary  vortex  in  the  first  85% 
of  the  delta  wing,  where  the  flow  field  is  approximately  conical. 
At  a  chord  length  of  85%,  where  the  sharp  leading  edge  connects 
to  the  tip,  the  primary  vortex  and  related  shock  have  a  sharp  kink, 
see  Figures  3  and  6.  Two  shocks  develop  in  the  primary  vortex. 
One  normal  to  the  leading  edge  and  connected  to  the  kink  in 
the  shock  structure  under  the  primary  vortex  and  another  one 
from  the  same  location  on  the  leading  edge  and  connected  more 
upstream  to  the  shock  under  the  primary  vortex.  A  simOar  shock 
structure,  although  slightly  more  downstream,  was  observed  by 
Hoeijmakers  et  al.  [8]  using  a  much  finer  structured  grid.  This 
shock  structure  has  a  strong  influence  on  the  primary  vortex, 
which  completely  blows  up  behind  it,  see  Figure  5,  and  is  very 
weU  captured  by  the  grid  adaptation.  Also  visible  in  Figure  5  is 
that  the  grid  is  adapted  to  the  trailing  edge  vortex.  The  primary 
vortex  significantly  grows  after  85%  chord  and  merges  with  the 
tip  vortex,  see  Figure  4.  Also  visible  is  the  start  of  roll-up  of  the 
wake,  which  develops  into  a  mushroom  type  vortex  structure.  In 
addition  to  the  shock  structures  in  and  around  the  primary  vortex 
there  is  also  a  shock  starting  at  about  75%  downstream  at  the 
center  line  and  connected  to  the  trailing  edge  at  approximately 
mid  span.  A  better  view  of  this  shock  can  be  obtained  in  Figure  6 
which  gives  a  perspective  view  of  the  delta  wing  and  the  grid  and 
flow  field  at  approximately  70%  chord.  Figure  5  clearly  shows 
the  strong  primary  vortex  and  the  shock  between  the  vortex  and 
body.  Also  visible  is  the  significant  refinement  in  this  region 
and  the  vortex  layer  starting  at  the  sharp  leading  edge. 

PARALLELIZATION 

The  above  described  algorithm  has  been  implemented  in  the 
program  Hexadap,  which  is  parallelized  on  shared  memory  ma¬ 
chines,  namely: 

•  A  two  processor  NEC  SX-3/22  with  a  peak  performance  of 
2  X  2.75  GFlop/s,  a  main  memory  unit  (MMU)  of  1  GByte 
and  4  GByte  Extended  Memory  Unit  (XMU)  of  which  1.2 
GByte  can  be  efficiently  used  to  store  run-time  data. 


Small 

Medium 

Large 

Long 

vector  length 

8000 

2000 

1000 

120,000 

iterations 

100 

100 

300 

100 

adaptations 

0 

1 

1 

0 

Table  2:  Problem  sizes 


•  A  four  processor  SGI  Power  Challenge  with  a  peak  perfor¬ 
mance  of  4  X  350  MFlop/s,  main  memory  of  256  MByte  and 
16  KByte  primary  and  4  MByte  secondary  cache. 

The  parallelization  uses  microtasking,  adding  parallelization 
compiler  directives,  for  both  machines,  and  macrotasking,  ex¬ 
plicitly  assigning  tasks  to  different  processors.  (Implementation 
on  the  SGI  Power  Challenge  is  done  with  the  CONCURRENT 
CALL  assertion).  The  advantage  of  microtasking  is  that  the 
code  remains  portable.  The  advantage  of  macrotaking  is  that 
large  tasks  can  be  assigned,  even  if  the  tasks  have  no  do-loop 
structure,  and  memory  can  be  used  more  efficiently. 

The  above  described  algorithm  consists  of  two  parts,  namely 
grid  adaptation  and  flow  computation.  The  grid  adaptation  part, 
which  consists  predominantly  of  scalar  operations,  requires  a 
domain  decomposition  for  parallelization  and  is  not  considered 
in  this  paper.  The  flow  computation  has  as  most  important  com¬ 
ponent  the  calculation  of  cell  face  fluxes  and  consists  of  loops 
over  the  cell  faces.  The  result  is  added  to  the  residual  in  the 
two  cells  connected  to  each  cell  face.  The  loops  use  indirect  ad¬ 
dressing  and  in  order  to  vectorize  these  loops  a  coloring  scheme 
has  been  applied. 

The  initial  flow  field  and  the  flow  field  after  three  and  six  adapta¬ 
tions  is  used  to  test  the  parallel  performance  of  the  flow  solution 
algorithm,  see  Table  1.  These  cases  are  denoted  Small,  Medium 
and  Large.  The  average  vector  length  and  number  of  iterations 
are  presented  in  Table  2,  which  shows  that  the  average  vector 
length  decreases  with  problem  size.  This  is  caused  by  the  in¬ 
creasing  number  of  colors  after  grid  adaptation.  A  reduction 
in  the  number  of  colors  is  possible  by  limiting  the  number  of 
neighboring  cells  connected  to  each  cell  face.  In  order  to  inves¬ 
tigate  the  dependence  of  the  performance  results  on  the  vector 
length  a  special  case,  labeled  Long,  is  also  tested,  see  Table  2. 

NEC  SX-3 

The  two  computationally  most  intensive  parts  of  the  flow  solu¬ 
tion  algorithm  are  the  routines  Limit  and  Flux.  Limit  applies 
a  slope  limiter  to  ensure  monotonicity  and  Flux  computes  the 
fluxes  through  cell  faces.  The  suffixes  IG  and  4G  in  Tables  3 
and  4  refer  to  the  number  of  Gauss  quadrature  points  used  in 
the  evaluation  of  the  flux  integrals.  The  two  routines  constitute 
90%  of  the  total  computing  time.  They  have  roughly  the  same 
structure:  a  nested  loop,  first  over  all  colors  and  then  over  all 
faces  of  one  color. 

MFlop  rates  on  a  single  processor  NEC  SX-3  are  reported  in 
Table  3.  The  rates  are  based  on  flop  counts  and  elapsed  times. 
The  decrease  in  overall  performance  for  the  Medium  and  Large 
problems  is  caused  by  the  larger  number  of  colors  after  grid 
adaptation  which  results  in  a  reduced  vector  length.  The  case 
Long  does  not  suffer  from  this  reduction  in  performance.  Also 
indicated  in  Table  2  is  if  the  grid  is  adapted. 
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Q2Q 

Limit 

mjgji 

SmaU(4G) 

474 

392 

426 

Medium(lG) 

406 

258 

232 

Large(lG) 

371 

241 

265 

Long(4G) 

484 

445 

452 

Long(lG) 

463 

318 

314 

Table  3:  Mega  flop  rates  on  single  processor  NEC  SX-3  (based 
on  elapsed  times) 

Two  parallelization  strategies  have  been  tested.  The  first  strat¬ 
egy  executes  the  loops  over  the  colors  in  parallel  and  vectorizes 
the  inner  loop  over  the  faces.  Part  of  the  inner  loop  over  the 
faces  consists  of  an  update  of  the  residuals  at  the  cell  centers. 
Within  one  color  all  faces  connect  to  cells  with  different  cell  ad¬ 
dresses,  but  this  is  not  assured  between  different  colors,  causing 
a  data  dependency.  Hence,  in  the  above  parallelization  strategy, 
the  residual  updates  have  to  be  performed  in  a  critical  section, 
where  only  one  processor  is  active  at  a  time.  The  second  strat¬ 
egy  divides  the  loop  over  the  faces  within  one  color  over  the 
available  processors.  The  main  problem  with  this  approach  is 
that  sufficient  vector  length  should  remain  after  loop  division. 

The  MFlop  rates  and  speedup  results  are  presented  in  Table  4. 
The  timings  and  speedups  are  influenced  by  the  use  of  the  ex¬ 
ternal  memory  unit  XMU  of  the  SX-3.  The  XMU  allows  for 
fast  access  to  data  which  cannot  be  placed  in  core  memory.  Se¬ 
quentially,  the  use  of  the  XMU  instead  of  core  memory  hardly 
decreases  performance.  During  parallel  execution,  however, 
locks  applied  during  FO  seriously  deterioriate  the  performance. 
If  we  compensate  for  the  time  spent  during  FO  to  the  XMU 
speedups  increase,  the  corrected  speedups  are  labeled  Corr  in 
Table  4.  The  MFlop  rates  in  Table  4  are  based  on  the  corrected 
speedups. 

The  results  for  the  first  parallelization  strategy,  namely  parallel 
execution  of  loops  over  the  colors,  are  obtained  using  micro 
tasking  and  are  labeled  ’C’  in  Table  4.  The  speedups  are  with 
respect  to  elapsed  times.  It  is  clear  from  the  results  that  the  effi¬ 
ciency  of  the  parallelization  is  rather  low.  This  has  two  reasons. 
First,  the  critical  section  consumes  20%  of  the  computing  time, 
and  second,  the  parallel  system  overhead  is  about  10%.  This 
large  sequential  part  limits  the  maximum  attainable  speedup  on 
more  processors  to  5. 

The  second  parallelization  strategy,  namely  parallel  execution 
of  loops  over  the  faces  within  one  color,  does  not  suffer  from  a 
critical  section.  At  first  the  code  was  parallelized  using  micro- 
tasking.  The  program  structure  is  such  that  the  flux  computation 
is  split  into  many  different  loops  in  different  functional  subrou¬ 
tines.  Therefore  the  computational  load  per  loop  is  low,  less  than 
1 .5  msec.  It  turned  out  that  this  load  is  too  low  to  be  efficient  on 
the  NEC  SX-3:  the  parallel  overhead  was  as  large  as,  or  even 
larger  than  the  parallel  gain  and  no  speedup  was  obtained. 

Using  macrotasking  the  parallel  overhead  could  be  reduced  sig¬ 
nificantly.  Instead  of  parallelizing  each  loop  separately,  the  work 
is  divided  into  two  tasks  in  the  subroutines  Flux  and  Limit,  each 
task  doing  the  same  job  as  the  subroutines,  but  on  only  half  the 
loop.  This  not  only  reduced  the  parallel  system  overhead,  but 
also  reduced  memory  use.  In  microtasking  local  data  is  copied 


Flux 

Corr 

Limit 

SX-3 

C 

1.5 

1.6 

1.6 

1.4 

1.5 

624 

Small(4G) 

F 

1.6 

1.6 

1.7 

1.3 

1.3 

566 

SX-3 

C 

1.5 

1.8 

1.2 

1.5 

1.6 

364 

Medium(lG) 

F 

1.3 

1.6 

1.4 

1.5 

1.6 

376 

SX-3 

C 

1.4 

1.8 

1.2 

1.2 

1.3 

356 

Large(lG) 

F 

1.1 

1.4 

1.2 

1.1 

1.2 

322 

SX-3 

C 

1.5 

1.5 

1.4 

1.3 

1.4 

614 

Long(4G) 

F 

1.7 

1.7 

1.6 

1.5 

1.6 

701 

SX-3 

C 

1.5 

1.8 

1.3 

1.3 

1.4 

440 

Long(lG) 

F 

1.6 

1.9 

1.6 

1.5 

1.6 

495 

SGI 

LL 

2.9 

- 

3.3 

2.1 

- 

85 

Small(4G) 

F 

3.7 

- 

2.9 

2.3 

- 

94 

SGI 

LL 

1.5 

- 

2.0 

1.5 

- 

37 

Medium(lG) 

F 

2.9 

- 

2.3 

2.0 

- 

51 

Table  4;  Speedups  relative  to  single  processor  performance 
(based  on  elapsed  times);  SX-3  two  processors;  SGI  four  pro¬ 
cessors;  C;  parallel  loop  over  colors  (microtasking);  F:  parallel 
loop  over  faces  within  one  color  (macrotasking);  LL:  Low  level 
microtasking 

for  each  processor,  in  macrotasking  the  local  data  can  be  defined 
per  task,  and  thus  approximately  halved  with  respect  to  the  se¬ 
quential  program.  Memory  use  for  the  medium  sized  problem 
is  498  MByte  for  the  sequential  program,  540  MByte  for  the 
microtasked  program  and  515  MByte  for  the  macrotasked  pro¬ 
gram.  Speedups  for  the  macrotasked  program  are  presented  in 
Table  4  and  labeled  ’F’. 

The  decrease  in  parallel  performance  with  increased  problem 
size  can  be  attributed  to  the  reduced  vector  length.  This  is 
clearly  demonstrated  by  the  results  of  test  case  Long,  which 
has  an  average  vector  length  of  120000  in  the  loops  over  the 
cell  faces.  This  problem  reaches  the  highest  parallel  perfor¬ 
mance,  with  a  speed-up  of  1.9  in  routine  Flux.  Another  factor 
which  significantly  reduces  the  performance  of  the  flow  solu¬ 
tion  algorithm  on  a  NEC  SX-3  computer  is  the  limited  memory 
bandwidth.  This  is  especially  important  for  the  large  number 
of  indirectly  addressed  loops  and  a  main  reason  for  the  big  gap 
between  sustained  and  peak  performance.  The  memory  band¬ 
width  limitations  are  the  most  evident  in  Limit,  where  the  ratio 
between  computations  and  load/stores  is  rather  low. 

SGI  Power  Challenge 

The  SGI  Power  Challenge  has  scalar  processors  and  therefore 
no  problems  with  data  dependencies  within  a  processor.  The 
code  was  therefore  parallelized  using  the  second  parallelization 
strategy,  namely  parallel  execution  of  the  loops  over  the  cell 
faces.  Only  the  Small  and  Medium  problems  were  tested,  since 
the  other  problems  did  not  fit  in  memory. 

Two  implementations  are  made,  one  by  parallelizing  each  loop 
separately  (low-level),  and  one  using  the  same  macrotasking 
structure  as  described  in  the  previous  section.  Parallelization  is 
straightforward  using  the  parallel  code  of  the  SX-3.  Directives 
are  changed  to  SGI  directives.  The  macrotasking  is  accom¬ 
plished  using  the  CONCURRENT  CALL  assertion. 

Results  of  speedups  and  MFlop  rates  are  presented  in  Table  4. 
The  low-level  parallelization  is  labeled  ’LL’  and  the  macrotask- 
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Figure  2:  Cache  dependency  of  speedups  on  the  SGI  Power 
Challenge  in  routine  Flux(4G)  ( —  Small - Medium) 


ing  results  are  labeled  ’F’.  Since  the  SGI  has  no  XMU  there 
is  no  correction  for  the  speedups:  the  entire  program  is  run  in 
core  memory.  The  speedups  for  macrotasking  are  better  than 
for  the  low  level  parallelization.  The  performance  in  MFlops  of 
the  SGI  four  processor  Power  Challenge,  as  listed  in  Table  4,  is 
between  10%  and  17%  of  the  two  processor  SX-3  performance 
and  not  sufficient  for  large  scale  computing.  The  percentage 
of  peak  performance  is  between  3%  and  7%  on  the  SGI  Power 
Challenge  and  between  6%  and  13%  on  the  two  processor  NEC 
SX-3. 

Results  of  the  SGI  Power  Challenge  are  rather  sensitive  to  cache 
misses.  A  parameter  in  the  flow  solution  algorithm  determines 
the  number  of  cell  faces  in  the  flux  calculation  processed  at  one 
time.  Varying  this  parameter  changes  the  amount  of  the  data 
being  processed,  and  can  be  used  to  optimize  the  cache  use  of 
the  program.  Significant  differences  can  occur,  and  the  optimal 
value  of  the  parameter  depends  on  the  problem  at  hand,  (see 
Figure  2).  The  speedups  of  Table  4  are  computed  using  the 
optimal  timing  results. 

Estimate  of  the  computing  time  for  a  LES  of  a  clean  wing  on 
a  NEC  SX-4/16  computer 

The  parallel  performance  on  the  NLR  NEC  SX-3/22  has  been 
used  to  estimate  the  problem  size  of  a  Large  Eddy  Simulation 
of  a  clean  wing  on  a  16  processor  NEC  SX-4,  which  will  be 
delivered  to  NLR  in  1996.  The  NLR  NEC  SX-4/16  is  expected 
to  have  a  peak  performance  of  32  Gflop/s,  a  main  memory  of 
4  GByte  and  8  GByte  XMU.  With  respect  to  the  SX-3/22  its 
architecture  is  more  suited  for  indirect  addressing  and  a  single 
processor  speedup  of  2  is  expected  for  programs  using  indirect 
addressing. 

The  size  of  the  LES  is  primarily  determined  by  the  available 
memory.  Let  N  be  the  number  of  grid  points,  and  n  the  number 
of  flow  variables.  For  a  Large  Eddy  Simulation  with  a  one- 
equation  turbulence  model  we  have  n  =  6.  The  memory  use 


of  Hexadap  is  8(12n  +  40)/V  -f  2  •  10®  Byte.  With  an  avail¬ 
able  memory  of  8  Gbyte  and  8  bytes  per  variable  the  maximum 
number  of  grid  points  iV  =  9  •  10*.  Using  the  estimates  given 
by  Chapman  [3],  this  number  allows  for  a  LES  with  sublayer 
resolution  around  a  clean  wing  at  a  Reynolds  number  of  approx¬ 
imately  10*. 

The  computing  time  for  one  time  step  is  estimated  from  the 
relation: 

//g  1  /1.3.1.1-/^  A 

~  Sa- Sc  \rs  S^6  \  rp  ri  tr)  ) 

with:  Sa  a  factor  to  account  for  grid  adaptation,  Sa  —  0.9, 
Sc  the  single  processor  speedup  of  the  NEC  SX-4  compared  to 
the  SX-3,  Sc  -  2.  The  suffixes  S,  F,  L  and  R  refer  to  the 
following  parts  of  the  algorithm:  5,  serial  part,  f ,  subroutine 
Flux(lG),  L,  subroutine  Limit,  and  R  the  remaining  part  of  the 
flow  solution  algorithm  which  is  paraUelizable.  The  variables 
ft  denote  flop  counts  in  the  respective  parts  of  the  algorithm 
to  advance  one  flow  variable  one  time  step  in  one  grid  point. 
The  measured  values  are:  fs  —  90,  fp  =  1570,  /l  =  880 
and  fn  —  180.  The  variables  r,  denote  the  measured  flop 
rates  in  the  respective  parts  of  the  algorithm  and  are  equal  to: 
rs  =^rR  =  350- 10*,  tf  =  463- 10*  and  tl  =  350- 10*  flop/s. 
The  flop  count  in  routine  Flux  is  increased  with  10%  for  the 
viscous  contribution  and  30%  for  a  one-equation  subgrid  model 
using  the  Germano  approach.  The  parallel  speedup,  denoted  by 
5i6  on  a  16  processor  NEC  SX-4  is  estimated  as  twelve.  The 
computing  time  required  to  advance  one  time  step  on  a  grid  with 
9  •  10*  grid  points  is  then  approximately  28  seconds. 

The  time  scale  of  the  smallest  eddies  in  the  flow  field  will  be 
approximately  100  times  larger  than  the  CFL  limit  for  an  explicit 
scheme.  The  CFL  time  step  limitation  can  be  removed  with  an 
implicit,  time  accurate  temporal  discretization  using  multigrid 
acceleration.  With  these  assumptions  a  Large  Eddy  Simulation 
of  a  clean  wing  at  a  Reynolds  number  10*  on  a  mesh  with 
9  •  10*  grid  points  which  evolves  6500  time  steps,  which  should 
be  sufficient  to  obtain  a  reasonable  statistical  sample,  would 
require  50  hours  on  a  16  processor  NEC  SX-4. 

Conclusions  of  the  parallelization 

Provided  that  the  vector  length  is  sufficient,  the  most  efficient 
parallelization  strategy  for  the  present  flow  solution  algorithm  is 
a  high  level  parallelization  of  loops  over  faces  of  one  color  using 
macrotasking.  Macrotasking  reduces  parallel  system  overhead 
and  memory  use.  Correcting  for  the  XMU  a  maximum  speedup 
of  1.9  is  reached  on  a  two  processor  SX-3. 

There  are  three  causes  for  the  not  perfect  overall  performance 
on  the  NEC  SX-3: 

•  I/O  between  Main  Memory  and  XMU  in  parallel  processing 
takes  significantly  more  time, 

•  Vector  length  decreases,  and  hence  single  processor  speed, 

•  Parallel  system  overhead. 

Concerning  the  latter  cause,  the  balance  between  the  two  pro¬ 
cessors  is,  when  corrected  for  the  I/O  between  MMU  and  XMU, 
as  predicted  by  the  size  of  the  parallel  part  of  the  algorithm. 
Hence,  the  computational  load  is  well  balanced,  and  the  remain¬ 
ing  performance  loss  can  only  be  explained  by  parallel  system 
overhead.  Since  the  NEC  SX-3  is  not  primarily  suited  for  par- 
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allel  use,  the  relatively  high  parallel  system  overhead  is  not  too 
surprising.  It  is  expected  that  the  NEC  SX-4  has  significantly 
less  overhead. 

Low-level  do-loop  parallelization  on  the  NEC  SX-3  turns  out 
to  be  only  sufficient  for  loops  with  a  computational  load  greater 
than  1.5  msec. 

The  parallel  efficiency  on  the  SGI  Power  Challenge  is  simi¬ 
lar,  the  percentage  of  peak  performance  is  relatively  low,  even 
compared  with  the  NEC  SX-3.  Moreover,  the  cache  sensitivity 
makes  the  optimization  problem  dependent. 

The  present  parallelization  on  the  NEC  SX-3  will  not  be  suf¬ 
ficiently  efficient  on  the  16  processor  SX-4.  The  parallel  ex¬ 
ecution  of  the  loops  over  the  cell  faces  is  inefficient  since  the 
loop  length  will  be  too  short  to  be  divided  over  16  processors. 
This  problem  can  be  solved  by  limiting  the  number  of  neigh¬ 
boring  cells  connected  to  one  cell  face  to  at  most  four,  which 
significantly  reduces  the  number  of  colors  and  thereby  increases 
vector  length.  The  parallel  execution  of  the  loop  over  the  colors 
contains  a  sequential  part  of  20%,  and  hence  has  a  maximum 
speedup  of  5 .  This  sequential  part  can  be  eliminated  using  a  do¬ 
main  decomposition  of  the  grid,  which  also  has  as  main  benefit 
that  the  grid  adaptation  part  can  be  executed  in  p2irallel. 

CONCLUDING  REMARKS 

The  discontinuous  Galerkin  finite  element  method  with  lo¬ 
cal  grid  enrichment  has  been  demonstrated  on  the  three- 
dimensional,  inviscid  flow  field  around  a  delta  wing  at  tran¬ 
sonic  speed.  The  use  of  anisotropic  grid  refinement  of  hexa¬ 
hedron  type  cells  is  effective  in  capturing  the  shock  structure 
and  primary  vortex  on  the  leeward  side  of  the  delta  wing.  The 
discontinuous  Galerkin  method  works  well  on  highly  irregular 
grids  and  is  therefore  a  good  candidate  for  Large  Eddy  Sim¬ 
ulations,  because  it  offers  the  opportunity  to  capture  viscous 
sublayers  with  successively  finer  grids  through  local  grid  refine¬ 
ment.  An  estimate  of  the  required  computational  resources  for 
such  a  simulation  is  presented.  The  use  of  a  face  based  data 
structure  works  well  in  combination  with  local  grid  refinement 
and  allows  efficient  vectorization  and  parallelization  of  the  code. 
On  the  NEC  SX-3  the  possible  speedup  through  parallelization 
strongly  depends  on  the  vector  length.  A  maximum  speed-up  of 
1 .9  on  the  two  processor  NEC  SX-3  is  obtained  when  sufficient 
vector  length  was  available.  A  good  parallel  performance,  with 
a  speed-up  of  3.7,  is  obtained  on  the  four  processor  SGI  Power 
Challenge,  but  the  results  are  sensitive  to  cache  misses. 

From  the  present  results  it  is  estimated  that  for  future  LES  ap¬ 
plications  in  wall  bounded  flows,  the  gain  from  the  increased 
computational  efficiency  obtained  from  highly  adapted  grids 
more  than  compensates  the  increased  number  of  operations  and 
memory  use.  A  LES  of  a  clean  wing  at  a  Reynolds  number 
of  10^  will  become  feasible  on  a  16  processor  NEC  SX-4  in 
a  turnaround  time  of  one  weekend.  Significant  further  devel¬ 
opments,  such  as  the  addition  of  the  viscous  contribution  and 
implicit  time-accurate  temporal  discretization  using  multigrid 
acceleration  (in  progress),  will,  however,  be  needed  to  reach 
this  goal. 
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Figure  5.  Total  pressure  loss  and  adapted  grid  in  cross-section  through  primary  vortex  core.  (Mot 
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Figure  6.  Total  pressure  loss  and  adapted  grid  in  cross-sectional  70%  chord.  (Moo  =  0.85, a  =  20°) 
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Abstract 

We  indicate  that  the  use  of  higher  order  accurate 
spatial  discretization  is  necessary  to  obtain  suffi¬ 
ciently  accurate  DNS  for  the  validation  of  subgrid 
models  in  LES.  Furthermore,  we  pay  attention  to 
the  efficiency  of  the  implementation  of  these  dis¬ 
cretizations  on  several  parallel  platforms.  In  order 
to  illustrate  this,  we  consider  compressible  flow  over 
a  fiat  plate.  We  give  a  priori  test  results  for  LES  of 
this  flow. 

1  Introduction 

One  of  the  most  challenging  problems  in  Com¬ 
putational  Fluid  Dynamics  (CFD)  is  the  accurate 
and  efficient  simulation  of  turbulent  flows  for  rel¬ 
evant  industrial  applications.  The  behaviour  of 
these  flows  is  governed  by  the  Navier-Stokes  equa¬ 
tions.  However,  because  these  applications  usu¬ 
ally  involve  complex  geometries  and  flow-fields,  the 
computational  resources  required  for  directly  solv¬ 
ing  the  Navier-Stokes  equations  are  far  beyond  the 
resources  which  will  be  available  in  the  foreseable 
future.  In  this  paper  we  will  focus  on  turbulent 
compressible  flow-problems  in  simple  geometries. 
In  order  to  tackle  these  problems  with  presently 
available  computers,  three  different  aspects  must 
be  considered:  the  modelling  of  turbulent  flows, 
the  numerical  methods  used  to  perform  calculations 
with  these  models,  and  the  implementation  of  these 
methods  on  suitable  computer  platforms. 

As  remarked  above,  direct  solution  of  the  Navier- 
Stokes  equations  (DNS)  is  impossible  for  relevant 
industrial  applications,  due  to  the  high  computa¬ 
tional  requirements.  Therefore,  one  might  use  in¬ 
stead  the  Reynolds  averaged  Navier-Stokes  (RaNS) 
equations  in  which  only  the  statistically  stationary 
flow  is  calculated  and  the  effects  of  turbulence  are 
modelled  by  a  so-called  turbulence  model.  How¬ 
ever,  this  leads  in  general  to  quite  inaccurate  results 


since  the  presently  available  turbulence  models  are 
inadequate  for  more  complicated  flow  phenomena 
like  shock-boundary  layer  interaction  and  massive 
separation.  A  solution  to  this  problem  could  be 
provided  by  Large  Eddy  Simulation  (LES).  In  LES 
only  the  large  eddies  are  calculated,  while  the  effects 
of  the  smaller  eddies,  which  are  thought  to  be  uni¬ 
versal  and  not  geometry-dependent,  are  described 
by  a  subgrid  model. 

However,  before  LES  can  be  used  as  a  tool  in  flow 
simulation,  the  subgrid  model  has  to  be  systemat¬ 
ically  validated.  This  validation  is  usually  carried 
out  by  comparing  LES  results  with  filtered  DNS  re¬ 
sults  for  simple  geometries  and  fairly  low  Reynolds 
numbers.  In  Section  2  we  present  a  priori  test  re¬ 
sults  for  LES  of  compressible  flow  over  a  flat  plate 
for  various  subgrid  models,  including  eddy-viscosity 
models,  the  similarity  model  and  dynamic  models. 
In  the  future  also  a  posteriori  tests  will  be  carried 
out  fot  this  flow,  as  has  been  done  e.g.  by  Vreman 
et  al.  [1]  for  the  compressible  mixing  layer. 

The  numerical  methods  to  perform  the  DNS  are 
discussed  in  Section  3.  The  a  priori  test  results  are 
based  on  DNS  performed  using  a  second-order  fi¬ 
nite  volume  spatial  discretization.  It  is  indicated 
that  the  use  of  higher  order  spatial  discretizations 
makes  it  possible  to  obtain  more  accurate  DNS  re¬ 
sults.  However,  the  use  of  higher  order  central  dif¬ 
ferencing  discretizations,  without  numerical  dissi¬ 
pation,  is  not  without  trouble.  Besides  the  occur¬ 
rence  of  stability  problems,  higher  order  discretiza¬ 
tions  lead  to  wide  stencils,  which,  in  combination 
with  a  domain-decomposition  strategy,  seriously  af¬ 
fects  the  parallel  efficiency  of  the  resulting  algo¬ 
rithm. 

In  Section  4  the  parallel  efficiency  will  be  il¬ 
lustrated  using  some  implementations  of  the  DNS 
solver  on  various  parallel  platforms,  including  dis¬ 
tributed  as  well  as  shared  memory  systems,  and  a 
mixture  of  these  types.  Since  many  parallel  plat¬ 
forms  use  cache-based  processors,  we  consider  some 
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aspects  of  implementation  of  the  flow-solver  on 
these  processors.  We  show  that  careful  use  of  cache 
in  the  implementation  of  our  type  of  discretizations 
can  lead  to  considerable  performance  gain. 

2  Modelling  of  turbulent  flow 

The  equations  describing  compressible  flow  are  the 
well  known  Navier-Stokes  equations,  which  repre¬ 
sent  conservation  of  mass,  momentum  and  energy: 

dtp  +  djipuj)  =  0 

dtipUi)  -I-  dj{pUiUj)  +  dip-djTij-0  (1) 

dtc  -I-  dj{{e  +  p)uj)  -  dj{TijUi  -  qj)  =  0 

Here  the  symbols  dt  and  dj  are  abbreviations  of 
the  partial  differential  operators  d/dt  and  d/dxj 
respectively.  The  components  of  the  velocity  vector 
are  denoted  by  Ui,  while  p  is  the  density  and  p  the 
pressure  which  is  related  to  the  total  energy  density 
e  by:  ^ 

p  =  {j-  l){e  -  -pUiUi}  (2) 

in  which  7  denotes  the  adiabatic  gas  constant.  The 
viscous  stress  tensor  Tij  is  a  function  of  temperature 
T  and  velocity  vector  u 

Tij{T, u)  =  +  d^Uj  -  ^SijdkUk)  (3) 

where  p{T)  is  the  dynamic  viscosity  for  which  we 
either  use  Sutherland’s  law  for  air  or  treat  it  as  a 
constant.  In  addition  qj  represents  the  viscous  heat 
flux  vector,  given  by 

where  Pr  is  the  Prandtl  number.  Finally,  the  tem¬ 
perature  T  is  related  to  the  density  and  the  pressure 
by  the  ideal  gas  law 

T  -  -fM^-  (5) 

P 

These  governing  equations  have  been  made  dimen¬ 
sionless  by  introducing  a  reference  length  Lq,  ve¬ 
locity  uq,  density  po,  temperature  To  and  viscos¬ 
ity  po-  The  values  of  the  Reynolds  number  Re  = 
{PoUoLq)/po  and  the  Mach  number  M  =  uo/ao, 
where  oq  is  a  reference  value  for  the  speed  of  sound, 
are  given  separately. 

A  Direct  Numerical  Simulation  (DNS)  is  based  on 
a  discretisation  of  (1)  whereas  the  governing  equa¬ 
tions  for  large  eddy  simulation  (LES)  are  obtained 


by  applying  a  spatial  filter  to  these  equations.  A 
filter  operation  extracts  the  large  scale  part  /  from 
a  quantity  /: 

/(x)=  /  G^{x,OfiOdC  (6) 

JQ 

where  D  is  the  flow  domain  and  A  denotes  the  fil¬ 
ter  width  of  the  kernel  G  which  is  assumed  to  be 
normalized,  i.e.  the  integral  of  G  over  Q  equals  1 
independent  of  x.  For  compressible  flow  Favre  [2] 
introduced  a  related  filter  operation  /  =  pf/p- 

The  filtered  Navier-Stokes  equations  contain  so- 
called  subgrid-terms,  which  cannot  be  expressed  in 
the  filtered  flow  variables,  and  have  to  be  modelled 
with  subgrid-models.  In  this  paper  we  will  mainly 
focus  on  the  modelling  of  the  subgrid-terms  in  the 
momentum  equations,  which  can  be  expressed  in 
the  turbulent  stress  tensor,  defined  as 

prij  =  pUiUj  -  pu^puj/'p  =  'p{upLLj  -  UiUj),  (7) 

where  u  is  the  filtered  velocity  vector.  This  tur¬ 
bulent  stress  tensor  has  several  algebraic  properties 
which  can  be  used  in  the  construction  and  qualifi¬ 
cation  of  subgrid- models  [3,  4] .  Expressions  for  the 
subgrid-terms  in  the  energy  equation  can  be  found 
in  ref.  [5].  They  can  be  neglected  in  simulations 
at  low  Mach  numbers,  but  have  to  be  modelled  at 
high  Mach  numbers. 

In  total  six  models  for  the  turbulent  stress  ten¬ 
sor  Tij  as  it  appears  in  the  subgrid-terms  in  the 
momentum  equations  will  be  investigated  and  com¬ 
pared  in  this  paper.  The  first  subgrid-model  is  the 
Smagorinsky  model 

=  -pCiA^ISISi,,  (8) 

where  5^  =  with  Sij  the  compressible  strain 
rate,  based  on  the  Favre-filtered  velocity.  Cs  is  the 
Smagorinsky  constant,  which  we  choose  equal  to 
0.17  as  suggested  in  literature.  A  denotes  the  filter 
width,  which  separates  the  resolved  and  subgrid- 
scales.  The  major  short-coming  of  the  Smagorinsky 
model  is  its  excessive  dissipation  in  regions  where 
the  flow  is  laminar  [6].  The  similarity  model,  formu¬ 
lated  by  Bardina  et  al.  [7],  is  based  on  a  similarity 
assumption.  Application  of  the  definition  of  'pTij  to 
the  filtered  variables  p  and  yields  the  similarity 
model  [7]: 

pTif  =  PUiUj  -  pUiPUj/%  (9) 

The  gradient  model  is  derived  with  use  of  Taylor 
expansions  of  the  filtered  velocity  [8].  The  lowest 
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order  term  in  A  in  this  expansion  can  be  proposed 
as  subgrid-model: 

(W) 

The  similarity  and  gradient  model  correlate  much 
better  with  the  turbulent  stress  tensor  than  the 
Smagorinsky  model  (see  [9]  and  section  2.1).  How¬ 
ever,  while  the  Smagorinsky  model  is  too  dissipative 
in  transitional  regions,  the  similarity  and  gradient 
model  are  not  sufficiently  dissipative  in  turbulent 
regions. 

The  dynamic  procedure  overcomes  the  excessive 
dissipation  of  the  Smagorinsky  model  and  adds  suf¬ 
ficient  dissipation  to  the  similarity  and  gradient 
models.  We  consider  three  dynamic  models.  The 
dynamic  eddy-viscosity  model  [3]  is  obtained  when 
the  model  constant  Cs  in  the  Smagorinsky  model  is 
replaced  by  a  coefficient  which  is  dynamically  ob¬ 
tained  and  depends  on  the  local  structure  of  the 
flow.  In  order  to  calculate  the  dynamic  coefficient 
is  substituted  in  the  Germano  identity,  which 
is  a  relation  between  the  turbulent  stress  tensor 
for  different  filter  widths  [3].  The  second  dynamic 
model  is  the  dynamic  mixed  model,  in  which  a 
relatively  accurate  representation  of  the  turbulent 
stress  by  the  similarity  model  and  a  proper  dissipa¬ 
tion  provided  by  the  dynamic  eddy- viscosity  con¬ 
cept  are  combined  [10].  The  dynamic  model  coeffi¬ 
cient  is  obtained  by  substitution  of  the  base  mixed 
model,  -h  in  the  Germano  identity.  An¬ 
other  dynamic  model  is  the  dynamic  Clark  model 
[11],  In  this  case  the  base  model  is  the  Clark  model, 
-I-  ,  and  the  model  coefficient  Cs  is  obtained 

by  substitution  of  this  model  in  the  Germano  iden¬ 
tity. 

2.1  Results 

We  consider  flat  plat  flow  at  Re  =  1000  based  on  the 
initial  displacement  thickness  S»  and  the  other  ref¬ 
erence  scales  are  equal  to  the  initial  far-field  values. 
We  choose  M  —  0.5  and  consider  a  temporal  simu¬ 
lation  in  a  cubic  domain  of  size  30.  A  forcing  term 
corresponding  to  the  compressible  similarity  solu¬ 
tion  of  the  boundary  layer  equations  is  added.  The 
mean  initial  field  also  equals  this  similarity  solution, 
to  which  the  dominant  2D  mode  and  a  pair  of  equal 
and  oblique  3D  modes  are  added  with  amplitude 
10~^  and  amplitude-ratios  (1/2, 1/4, 1/4)  respec¬ 
tively.  For  validation  purposes  the  linear  growth 
rates  of  the  instabilities  were  recovered  with  a  rela¬ 
tive  error  well  within  1  percent  on  a  grid  with  128^ 


cells,  uniform  in  the  stream-  and  spanwise  direc¬ 
tions  and  clustered  near  the  isothermal,  no-slip  wall 
in  the  normal  direction.  A  second  order  accurate  fi¬ 
nite  volume  method  was  used. 


t 


Figure  1:  Modes  of  kinetic  energy  (a)  [(l,0):solid, 
(2,0):  dashed,  (1,1):  dotted,  (2,2):  dash-dotted]  and 
shape-factor  (solid),  skin-friction  (dashed)  versus  time 
t  (b) 

Results  from  a  DNS  on  128^  cells  are  shown  in 
Figure  1.  The  persisting  symmetry  in  the  span- 
wise  direction  was  exploited  in  order  to  reduce  the 
computational  effort.  The  evolution  of  the  ampli¬ 
tude  of  some  modes  of  the  kinetic  energy  (Fig.  1) 
clearly  displays  the  initial  linear  regime  with  an 
exponential  growth  of  the  instabilities.  The  corre¬ 
sponding  large-scale  structures  which  emerge  sub¬ 
sequently  interact  in  the  nonlinear  regime  and  give 
rise  to  a  rapid  transition  in  which  many  modes  be¬ 
come  simultaneously  important.  A  broad  spectrum 
is  generated  and  a  developed  turbulent  flow  results 
in  which  the  individual  modes  display  an  erratic 
time-dependence.  To  represent  this  scenario  in  a 
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different  way,  the  shape-factor  and  the  skin-friction 
are  shown  in  Figure  1.  The  resolution  is  adequate 
in  the  linear  and  transitional  stages  with  a  fall  off 
of  10  decades  or  more  in  the  spectrum  of  the  kinetic 
energy.  However,  at  the  onset  of  turbulent  flow  and 
in  the  developed  stages  a  fall  off  of  no  more  than  6-7 
decades  was  observed.  Hence,  the  results  in  the  tur¬ 
bulent  regime  are  expected  to  be  only  qualitatively 
correct  and  further  grid  refinement  is  needed. 


Figure  2:  Dynamic  coefficients  :  Germano  (solid), 
dynamic  mixed  (dashed)  and  dynamic  Clark  (dash- 
dotted). 

In  order  to  obtain  a  first  impression  of  the  quality 
of  the  various  subgrid  models  for  this  flow  we  focus 
on  the  correlation  between  pr^  and  the  correspond¬ 
ing  modelled  component  of  the  turbulent  stress  ten¬ 
sor.  We  use  a  filter- width  equal  to  four  grid-cells 
and  a  special  filtering  near  the  wall  which  prevents 
the  filter  to  extend  inside  the  wall.  The  models  are 
tested  both  in  the  transitional  and  in  the  turbulent 
regime.  The  similarity-  and  gradient  model  as  well 
as  the  dynamic  mixed  and  dynamic  Clark  model 
show  a  high  correlation  of  about  0.9.  The  Smagorin- 
sky  and  dynamic  eddy-viscosity  models  show  a  poor 
correlation  of  about  0.3.  The  eddy-viscosity  contri¬ 
bution  in  the  dynamic  mixed  and  dynamic  Clark 
model  does  not  destroy  the  high  correlation.  In 
Figure  2  we  compare  the  dynamic  coefficients  for 
the  three  dynamic  models  at  t  =  2700.  The  coef¬ 
ficients  are  averaged  over  the  homogeneous  direc¬ 
tions.  We  observe  that  the  Germano  coefficient  is 
larger  than  the  coefficient  associated  with  the  other 
two  dynamic  models.  Moreover,  all  coefficients  drop 
to  zero  in  the  near- wall  region  which  is  appropriate 
for  wall-bounded  shear  layers. 


3  Numerical  method 

As  has  been  remarked  in  the  previous  section,  the 
DNS  results  in  the  turbulent  regime  are  expected 
to  be  only  qualitatively  correct,  and  further  grid 
refinement  is  needed.  However,  the  number  of  grid- 
cells  used  is  already  fairly  large  for  presently  avail¬ 
able  computer  resources.  Instead  of  refinement,  we 
presently  consider  the  use  of  higher  order  discretiza¬ 
tion  methods.  The  aim  is  to  obtain  a  more  accurate 
DNS  with  a  moderate  number  of  points.  However, 
this  is  not  without  problems.  One  drawback  is  that 
high  order  methods  lead  to  wide  stencils,  which  de¬ 
creases  the  parallel  efficiency  of  the  resulting  code, 
as  we  will  see  in  the  next  section.  Another  problem 
associated  with  these  methods  is  that  the  discreti¬ 
sation  of  the  convective  and  the  viscous  flux  must 
be  carefully  constructed  in  order  to  avoid  instabili¬ 
ties.  This  is  especially  present  in  central  differenc¬ 
ing  methods,  and  is  not  only  related  to  the  occur¬ 
rence  of  TT-modes,  but  also  to  adequate  damping  of 
aliasing  errors. 

3.1  Spatial  discretization 

Consider  an  orthogonal  grid  with  points 
which  is  uniform  in  x  and  z  direction.  We  use  the 
following  central  differencing  discretization  of  the 
^-operator: 

^  (11) 

n=—d 

where 

d 

~  '^j!n,mfi,j+n,k+m-  (12) 

71,71%=  — d 

Here  the  weights  are  derivative  weights, 

and  10“*'  are  average  weights.  Due  to  the  unifor¬ 
mity  in  X  and  z  direction  they  only  depend  on  j. 
The  quantities  a  represent  the  average  of  the  func¬ 
tion  /  over  a  stencil  in  j  -k  direction.  For  the  con¬ 
vective  flux  we  use  a  stencil  with  Nc  points,  and  the 
weights  10“"  are  constructed  such  that  tt  modes  in 
the  j  and  k  direction  are  filtered  out,  and  moreover 
that  polynomials  up  to  degree  Nc  —  1  are  invari¬ 
ant  under  the  averaging.  The  derivative  weights 
yjdiff  are  such  that  polynomials  up  to  degree  Nc 
are  exactly  differentiated.  The  resulting  discretiza¬ 
tion  has  order  Nc  on  uniform  grids.  The  7r-modes 
in  z-direction  are  damped  by  the  viscous  derivative. 
The  viscous  flux  is  discretized  using  repeated  dif¬ 
ferentiation.  The  inner  derivative  is  calculated  on  a 
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staggered  grid.  Both  the  inner  and  the  outer  deriva¬ 
tives  are  discretized  analogously  as  in  the  convec¬ 
tive  flux,  on  Ny  points,  except  that  now  7r-modes 
are  not  filtered  out.  Both  derivatives  are  then  of 
order  Ny  -  1,  but  due  to  symmetry,  on  a  uniform 
grid,  the  viscous  flux  is  discretized  up  to  order  Ny. 

Due  to  the  nonlinearity  in  the  convective  flux, 
high  frequency  modes  arise  from  a  low-frequency 
initial  state.  In  physical  reality,  these  are  damped 
by  the  viscous  effects  in  the  fluid.  In  the  numeri¬ 
cal  simulation,  however,  two  difficulties  arise.  The 
first  is  that  both  the  convective  and  the  viscous  flux 
are  calculated  inaccurately.  In  our  central  differ¬ 
encing  discretisations,  on  relatively  coarse  grids,  a 
situation  may  arise  in  which  the  numerical  viscous 
terms  do  not  have  enough  dissipation  to  damp  the 
numerical  convective  terms,  giving  rise  to  instabil¬ 
ities.  The  second  difficulty  is  that  due  to  the  fi¬ 
nite  grid-spacing,  there  is  a  maximum  wavenumber 
which  can  be  represented  on  the  grid.  Modes  with  a 
higher  wavenumber  appear  as  low-frequency  modes 
on  the  grid.  Therefore,  numerically,  the  effective 
energy  contained  in  the  low-frequency  modes  can 
be  increased  during  the  onset  of  turbulence.  One 
remedy  could  be  to  take  a  grid  that  is  sufficiently 
fine  to  represent  the  highest  mode  which  due  to 
physics  would  emerge  in  the  simulation.  Another 
possibility  is  to  use  upwind-biased  discretizations 
of  the  convective  flux,  as  has  been  done  by  Rai  and 
Moin  [12],  We  have  used  a  discretisation  of  the 
viscous  flux  with  a  wider  stencil  than  necessary  to 
achieve  the  desired  order  of  accuracy.  In  this  way 
we  constructed  a  better  approximation  of  the  vis¬ 
cous  flux.  As  an  example,  we  were  able  to  calculate 
a  full  transition  to  turbulence  on  96^  points  using 
a  fourth  order  method  on  a  5^-points  stencil  for 
the  convective  flux,  and  repeated  application  of  a 
fourth  order  method  on  6^  points  for  the  viscous 
flux,  resulting  in  an  11^-points  stencil,  whereas  re¬ 
peated  application  of  a  4^  points  operator  for  the 
viscous  flux  on  this  grid  failed.  At  this  moment, 
further  investigation  is  needed  to  understand  this 
phenomenon  more  clearly. 

The  DNS  mentioned  in  the  previous  section  has 
been  calculated  at  Mach  number  0.5.  In  the  future 
we  intend  to  perform  DNS  at  higher  Mach  num¬ 
bers.  For  that  purpose  we  need  to  be  able  to  capture 
shocks.  This  can  be  done  by  switching  to  upwind 
discretizations  in  the  presence  of  a  shock,  which  has 
been  applied  succesfully  to  the  supersonic  compress¬ 
ible  mixing  layer,  cf.  ref.  [13].  In  that  application 
a  fourth  order  central  difference  operator  has  been 
used  for  the  convective  term,  which  was  replaced  by 


Figure  3:  Shock-capturing  in  3D  turbulent  mixing- 
layer. 


a  third  order  accurate  upwind  scheme  in  the  pres¬ 
ence  of  a  shock.  See  Figure  3.  In  this  way  it  is  possi¬ 
ble  to  capture  time-dependent  shocks  which  appear 
spontaneously  after  the  transition  to  turbulence. 


3.2  Time  integration 

For  the  time  integration  of  the  resulting  discretized 
equations  we  use  an  explicit  4-stage  Runge  Kutta 
method.  We  also  studied  the  use  of  a  second-order 
accurate  implicit  method.  The  system  of  equa¬ 
tions  resulting  from  the  implicit  discretization  is 
solved  by  means  of  pseudo-time  stepping  and  ac¬ 
celerated  by  local  pseudo-time  stepping  and  a  non¬ 
linear  multigrid  technique.  Since  we  use  central 
spatial  discretizations  and  no  artificial  dissipation 
is  added  to  the  equations,  the  smoothing  method  is 
less  effective  than  in  the  traditional  use  of  multigrid 
in  steady-state  calculations.  In  the  laminar  regime 
and  in  the  first  stages  of  turbulence  the  implicit 
method  provides  a  speed-up  of  a  factor  of  2  rela¬ 
tive  to  the  explicit  method  on  a  relatively  coarse 
grid  (64^).  At  increased  resolution  this  speed-up  is 
enhanced  correspondingly.  See  [14]. 
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4  Parallel  implementation  of 
the  explicit  solver 

In  this  section  we  consider  some  implementational 
aspects  of  the  explicit  solver.  We  use  a  simple 
domain-decomposition  technique  to  obtain  an  im¬ 
plementation  on  a  parallel  computer.  This  is  ex¬ 
plained  in  the  first  subsection.  In  the  next  sub¬ 
section  we  discuss  how  the  parallel  efficiency  of  the 
resulting  code  depends  on  the  spatial  discretization. 
We  distinguish  between  the  intrinsic  efficiency  of  an 
algorithm,  and  the  hardware  efficiency.  The  former 
is  related  to  the  algorithm  only,  whereas  the  lat¬ 
ter  tells  us  how  good  a  certain  algorithm  performs 
on  certain  hardware.  The  quantity  which  is  usually 
called  the  efficiency  is  the  product  of  these  efficien¬ 
cies.  We  show  that  the  intrinsic  efficiency  of  the 
algorithm  decreases  as  the  order  of  the  spatial  dis¬ 
cretization  increases.  We  illustrate  these  concepts 
by  some  performance  results  obtained  from  imple¬ 
mentations  on  3  different  parallel  machines,  viz.  the 
Cray  T3d,  the  Intel  Paragon  and  the  SGI  Power 
Challenge  array.  Closely  related  to  the  concept  of 
efficiency  is  the  scalability.  We  discuss  the  scalabil¬ 
ity  in  the  sense  of  Amdahl  and  Gustafsson  (see  e.g. 
ref.  [15]). 

4.1  Domain  decomposition 


by  the  size  of  the  stencil,  but  also  the  number  of 
floating  point  operations  increases  with  increasing 
stencil-size.  To  see  why,  recall  the  general  form  of 
the  ^-operator,  eq.  (11)-(12).  This  derivative  is 
computed  as  a  one-dimensional  derivative  acting  on 
two-dimensional  averages  over  y  and  z.  For  the 
derivative  in  an  internal  boundary  point  these  aver¬ 
ages  have  to  be  computed  for  points  in  the  dummy- 
layers  as  well.  But  these  averages  are  also  com¬ 
puted  by  the  processors  dealing  with  the  neighbour¬ 
ing  block  in  order  to  contribute  to  the  ^  derivative 
of  some  points  in  that  block.  For  a  discretization  on 
a  stencil  with  N^y-NyX  Nz  points,  careful  counting 
reveals  that  the  number  of  floating-point  operations 
for  the  computation  of  one  derivative  is 

{SN^NyNz  +  ^dNyNz  +  2dN^Nz  +  4d^Nz){2d  -  1). 

Note  that  this  expression  is  not  symmetric  in 
Nx,Ny,Nz.  For  the  other  derivatives  the  discrete 
averaging  and  differentiation  operators  can  be  ap¬ 
plied  in  such  an  order  that  the  same  expression  is 
valid.  In  the  case  Nx  =  Ny  =  Nz  N,  this  reduces 
to 

{3N^ +  6N'^d  +  4d‘^N){2d-l).  (13) 

Now  consider  e.g.  a  given  partition  of  the  computa¬ 
tional  domain  into  equal  blocks,  each  containing 
(N/B)^  points.  Then  the  total  number  of  floating¬ 
point  operations  to  compute  a  ^  for  all  grid-points 


Suppose  our  computational  domain  consists  of  Nx  x 
Ny  X  Nz  gridpoints.  This  domain  is  divided  into 
Bx  X  By  X  Bz  blocks.  For  a  distributed  memory 
computer,  we  assume  that  each  block  is  allocated 
on  a  separate  processor.  If  the  total  size  of  the 
stencil  used  for  the  discretisation  is  (2d-f- 1)^  (recall 
that  we  use  central  differences,  cf.  (11), (12)),  then  a 
point  which  has  a  distance  less  than  d+l  grid-points 
from  the  boundary  of  a  block  not  coinciding  with 
the  boundary  of  the  physical  domain,  is  called  an  in¬ 
terior  boundary  point.  This  definition  can  easily  be 
extended  to  other  discretisation  methods.  For  the 
computation  of  the  fluxes  for  the  interior  boundary 
points,  some  values  of  the  flow-quantities  which  re¬ 
side  on  processors  dealing  with  neighbouring  blocks 
are  needed.  To  store  these  quantities,  each  block 
is  dressed  with  d  dummy-layers.  In  order  to  retain 
the  second-order  accuracy  of  the  time-integration 
method,  at  each  stage  in  the  Runge-Kutta  time- 
integration,  these  dummy-layers  have  to  be  trans¬ 
ferred  between  the  various  processors.  It  may  be 
clear  that  the  amount  of  communication  increases 
with  the  size  of  the  stencil. 

Not  only  the  amount  of  communication  is  affected 


is 

3((f )"  +  6{^fd  +  4d^^)i2d  -  1)B^ 

which  is  obviously  greater  than  (13). 


4.2  Parallel  efficiency 

To  quantify  the  considerations  of  the  previous  para¬ 
graph,  we  define  the  concept  of  intrinsic  efficiency. 
Consider  a  given  partition  of  the  computational  do¬ 
main  into  Bx  X  By  X  Bz  blocks.  Denote  the  to¬ 
tal  number  of  floating  point  operations  for  a  given 
number  of  timesteps  by  f{Bx,By,Bz)-  Then  the 
intrinsic  efficiency  is  given  by 


/(Fl,l) 

^intr  f(^Bx,By,Bzy 


Note  that,  on  a  shared  memory  machine,  if  we  use 
fine-grained  parallellism  (on  do-loop  level),  we  could 
define  =  1. 

We  can  estimate  the  dependence  of  the  intrinsic 
efficiency  on  the  size  of  the  stencil  just  by  counting 
the  number  of  floating-point  operations  for  various 
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block-sizes  (by  using  expressions  like  (13)).  In  Fig¬ 
ure  4  this  has  been  done  for  several  central  differenc¬ 
ing  discretizations,  using  equal  shapes  and  sizes  for 
all  blocks.  Prom  the  pictures  it  can  be  seen  that  the 
efficiency  decreases  rapidly  if  the  stencil-size  grows. 
Due  to  the  wider  stencil,  application  of  higher-order 
discretizations  results  in  more  floating-point  oper¬ 
ations,  but  this  performance  penalty  is  even  more 
severe  on  distributed  memory  systems,  where  also 
a  decrease  of  parallel  performance  occurs.  As  an 
example,  consider  a  central  differencing  second  or¬ 
der  ^  operator  on  a  3-point  stencil  as  compared 
to  a  central  differencing  fourth  order  ^  operator 
on  a  5-point  stencil.  To  compute  the  former  deriva¬ 
tive  on  a  single-cpu  machine  costs  approximately 
5/9  ~  0.56  times  of  the  time  to  compute  the  latter, 
whereas  on  e.g.  a  64  x  64  x  32  grid  and  128  proces¬ 
sors  on  a  distributed  memory  machine  this  ratio  is 
approximately  0.33. 


Figure  4:  Intrinsic  efficiency  for  various  spatial  dis¬ 
cretizations 


The  intrinsic  efficiency  deals  with  the  paralleliz- 
ability  of  a  given  algorithm,  regardless  of  any  ma¬ 
chine.  In  fact  it  gives  the  maximum  speed-up  that 
can  be  achieved  for  the  algorithm.  In  a  real  im¬ 
plementation  the  speed-up  will  be  less,  due  to  e.g. 
the  finite  bandwidth  of  the  machine.  To  quantify 
this,  we  now  define  the  hardware  efficiency 
Suppose  the  CPU  time  to  perform  a  certain  num¬ 
ber  of  timesteps  on  one  processor  using  one  block 
is  r(l,l,l).  Then,  using  BxByBz  processors,  the 
CPU  time  cannot  be  shorter  than 

BxByBzO'[j^^^{Bx,  By,  Bz) 


In  general,  due  to  the  finite  communications  band¬ 
width  of  the  machine,  the  simulation  will  last 
longer,  say  T{Bx,By,Bz)  seconds.  Then  the 
hardware-efficiency  is 

^ _ r(i>i,i) _ 

T(^Bx,  By,  Bz)BxByBza^ntABx,By,  Bz) ' 

(15) 

The  traditional  (total)  efficiency  a  is  the  product 
^  ~  ‘^hw'^intr’ 

Note  that,  in  general,  these  efficiencies  not  only  de¬ 
pend  on  the  number  of  blocks  in  each  direction,  but 
also  on  the  number  of  points  per  block  in  each  di¬ 
rection,  i.e.  on  the  actual  shape  of  the  blocks.  This 
is  not  only  due  to  the  ratio  of  interior  boundary 
points  as  compared  to  the  interior  points  of  each 
block,  but  also  because  many  processors  perform 
better  on  long  inner  loops  in  the  code,  due  to  vec- 
torisation  or  pipelining. 

The  efficiency  a  is  related  to  scalability  in  the 
sense  of  Amdahl,  meaning  that  a  problem  which 
is  solved  on  one  processor  in  Ti  seconds  is  solved 
on  P  processors  in  Ti/Pa  seconds.  We  define  one 
notion  of  efficiency  related  to  scalability  in  the  sense 
of  Gustafson.  Suppose  we  solve  a  problem  with  N 
gridpoints  on  one  processor  in  Ti  seconds,  and  a 
problem  with  PN  gridpoints  in  Tp  seconds.  Then 
the  efficiency  ao  is 


(17) 


These  concepts  are  illustrated  in  Figure  5.  Here 
we  performed  5  timesteps  on  a  64  x  64  x  32  grid, 
with  a  5  point  central  differencing  discretization  of 
the  convective  flux,  and  a  repeated  application  of  a 
four-point  central  differencing  for  the  viscous  flux, 
resulting  in  a  total  stencil  containing  7x7x7  points. 
Plotted  are  the  intrinsic  efficiency  and  the  total  ef¬ 
ficiency.  Because  it  was  not  possible  to  execute  the 
program  on  1  or  2  CPUs  on  the  Paragon,  the  effi¬ 
ciencies  are  based  on  the  timings  for  the  4-processor 
run.  We  used  2  different  distributed  memory  ma¬ 
chines,  viz.  the  Cray  T3d  and  the  Intel  Paragon. 
On  these  machines,  explicit  message-passing  has 
been  employed.  The  actual  CPU-times  for  the  runs 
are  tabulated  in  Table  1.  A  dash  indicates  that  it 
had  not  been  possible  to  perform  the  run  on  the 
indicated  number  of  processors,  either  because  the 
processors  do  not  have  enough  memory  (in  the  case 
of  1  and  2  processors  on  the  Paragon)  or  because 
the  indicated  number  of  processors  was  not  avail¬ 
able  on  that  machine.  The  CPU  times  are  depen- 
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#  proc. 

T3d 

Paragon 

1 

207.5 

- 

2 

109.6 

- 

4 

58.3 

90.6 

6 

42.7 

65.2 

8 

33.4 

50.3 

12 

23.8 

36.8 

16 

17.1 

27.4 

24 

13.5 

20.4 

32 

9.9 

15.7 

48 

7.5 

11.7 

64 

6.0 

11.1 

96 

4.8 

7.6 

128 

3.8 

- 

Table  1:  CPU  times  in  seconds  (averaged  over  several 
block-divisions). 

dent  on  the  actual  shape  of  the  blocks.  Therefore 
in  Table  1  we  averaged  over  some  block-divisions 
which  give  roughly  the  same  (approximately  best) 
CPU-time.  This  dependency  is  illustrated  in  Table 
2  for  the  case  of  8  blocks.  All  timings  are  accurate 
to  about  5  %.  It  can  be  seen  that  subdivisions  with 
an  equal  number  of  blocks  in  all  directions  are  op¬ 
timal.  In  general,  better  subdivisions  are  obtained 
by  using  fewer  blocks  in  ai-direction.  This  is  partly 
due  to  the  algorithm,  since  an  asymmetry  is  intro¬ 
duced  by  the  sequence  of  averaging-operators  in  the 
derivative-calculations,  and  partly  due  to  software¬ 
pipelining  in  the  processors,  which  is  reflected  in 
the  megaflop-rates  (between  parentheses). 


B^^x  By  X  B^ 

T3d 

Paragon 

1x1x8 

34.9  (77) 

58.0  (  47) 

335 

1x8x1 

34.3  (82) 

54.4  (  52) 

353 

8x1x8 

39.2  (77) 

67.3  (  45) 

378 

1x2x4 

32.4  (80  ) 

50.9  (52) 

323 

1x4x2 

31.6  (83  ) 

50.0  (53) 

328 

2x1x4 

33.0  (79  ) 

52.9  (50) 

327 

2x4x1 

31.7  (84  ) 

51.5  (53) 

335 

4x2x1 

32.9  (80  ) 

54.6  (50) 

327 

4x1x2 

32.9  (82  ) 

55.4  (49) 

339 

2x2x2 

31.1  (84  ) 

50.3  (53) 

325 

Table  2:  CPU  times  for  various  subdivisions  into  8 
blocks.  Between  parentheses  the  Mflop-rates.  The 
last  column  is  the  number  of  millions  of  floating  point- 
operations  to  be  performed  for  each  block. 

Prom  the  pictures  it  can  be  seen  that  on  the  T3d 
and  the  Paragon,  the  machine  efficiency  is  some¬ 
what  lower  than  the  algorithmic  efficiency.  This 


Figure  5:  Efficiency  for  the  T3d  (dashed)  and  the 
Paragon  (dotted).  The  solid  line  is  the  intrinsic  effi¬ 
ciency. 

means  that  increasing  the  algorithmic  efficiency  by 
e.g.  exchanging  information  between  the  processors 
after  every  calculation  of  averages  will  not  result  in 
a  substantially  faster  execution  of  the  code.  Fur¬ 
ther,  all  efficiencies  eventually  approach  zero  as  the 
number  of  processors  approaches  infinity.  It  can  be 
shown  (using  expressions  like  (13))  that  the  intrin¬ 
sic  efficiency  drops  as  where  B  is  the  to¬ 

tal  number  of  blocks.  However,  ac  remains  nearly 
constant,  as  is  shown  in  Table  3.  Here  each  block 
contains  32  x  16  x  16  points.  Prom  this  table  it 
follows  that,  using  this  algorithm,  doubling  the  size 
of  the  problem  and  the  number  of  processors  re¬ 
sults  in  equal  computation  times.  This  can  also  be 
shown  if  in  (17)  the  times  Tp  and  Ti  are  calculated 
as  ideal,  i.e.  assuming  no  communications  delays. 
Then  iTg  =  1. 


Bx  X  By  X  Bz 

T3d 

Paragon 

1x2x1 

16.2  (21.1  ) 

26.9  (12.7) 

1x4x1 

16.3  (42.1  ) 

27.0  (25.4) 

1x4x2 

16.4  (83.6) 

27.3  (50.2) 

2x4x2 

16.5  (166) 

27.6  (99.4) 

2x8x2 

16.5  (332) 

27.4  (200) 

2x8x4 

16.6  (661) 

27.7  (396) 

2x8x6 

16.6  (991) 

27.8  (592) 

4x8x4 

16.6  (1322  ) 

- 

Table  3;  CPU  times  and  Megaflop-rates  (between 
parentheses)  for  increasing  domain-sizes  illustrating 
that  ao  remains  approximately  constant. 

From  the  above  results  it  can  be  concluded  that 
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the  T3d  and  the  Paragon  show  comparable  efficien¬ 
cies  for  this  algorithm,  the  T3d  being  about  40  % 
faster. 

Besides  the  implementation  on  the  T3d  and  the 
Paragon,  we  have  made  a  preliminary  implemen¬ 
tation  on  the  SGI  Power  Challenge  Array.  This 
machine  consists  of  4  nodes  each  comprised  of  a  16- 
CPU  shared  memory  parallel  machine.  We  used 
explicit  message-passing  between  the  nodes.  On 
each  node,  fine-grained  parallelism  has  been  em¬ 
ployed  using  the  vendor-supplied  parallelizing  com¬ 
piler.  The  combination  of  fine-grained  parallelism 
and  explicit  message  passing  is  not  entirely  triv¬ 
ial.  On  the  one  hand,  using  fine-grained  paral¬ 
lelism  results  in  an  algorithmic  efficiency  of  1,  since 
no  additional  floating-point  operations  are  intro¬ 
duced.  Therefore,  this  form  of  parallelism  seems  to 
be  promising  at  first  sight.  On  the  other  hand,  how¬ 
ever,  parallelizing  a  do-loop  containing  only  a  few 
iterations  (in  the  order  of  magnitude  of  the  num¬ 
ber  of  grid-points  in  one  directions)  causes  much 
system-overhead,  and  seriously  affects  pipelining  ef¬ 
ficiency.  Moreover,  suboptimal  speedup  can  arise 
due  to  the  cache-coherency  mechanism.  The  use 
of  explicit  message-passing  has  two  disadvantages, 
namely  an  algorithmic  efficiency  less  than  one,  and 
usually  a  slow  data-transfer.  The  advantage  of  ex¬ 
plicit  message-passing  as  compared  to  fine-grained 
parallelism  is  that  parallelization  takes  place  on  a 
(much)  higher  level,  leading  to  less  system  over¬ 
head. 


As  an  example,  consider  a  problem  with  64  x  64  x 
32  grid-points  (the  same  as  discussed  above).  With 
4  processors  on  one  node  working  on  one  block,  this 
yields  an  execution  time  of  23  seconds  for  5  Runge- 
Kutta  timesteps,  whereas  on  4  nodes  with  4  blocks 
(1x2x2)  and  one  processor  per  node  the  execution 
time  is  18  seconds.  As  another  example,  we  com¬ 
pare  the  subdivision  into  1x2x2  and  2x4x2 
blocks,  both  running  on  4  nodes.  In  the  first  case, 
each  node  deals  with  1  block,  and  in  the  second  case 
each  node  does  the  computations  for  4  blocks,  and 
uses  2  processors  for  each  block.  So  in  that  case  the 
distributed  memory  model  is  adopted  also  within 
each  single  node.  It  appears  that  the  latter  case 
has  a  shorter  execution  time.  It  may  be  clear  that 
some  restructuring  of  the  code  is  necessary  in  order 
to  obtain  reasonable  performance.  This  will  be  the 
subject  of  another  paper  [16]. 


4.3  Optimization  for  cache-machines 

In  many  parallel  machines  the  processors  use  a  hi¬ 
erarchical  memory  structure,  consisting  of  a  small 
amount  of  memory  with  a  short  access  time  (the 
cache)  and  a  large  amount  of  main  memory  with 
much  longer  access  time.  This  long  access  time  is 
the  main  reason  why  the  performance  of  these  ma¬ 
chines  is  way  below  their  (often  impressive)  peak. 
In  the  implementation  of  a  numerical  algorithm,  it 
is  essential  to  use  the  cache  efficiently.  Therefore, 
the  number  of  load  and  store  operations  should  be 
kept  to  a  minimum,  and  quantities  which  are  loaded 
from  main  memory  should  be  reused  as  much  as 
possible  before  being  restored.  Further,  since  el¬ 
ements  from  main  memory  are  loaded  into  cache 
in  chunks  of  a  few  consecutive  elements,  do-loops 
should  be  arranged  such  that  main  memory  is  tra¬ 
versed  linearly  (as  is  also  necessary  for  efficient  use 
of  traditional  vector-processors).  Moreover,  it  will 
enable  software-pipelining  on  RISC-processors,  re¬ 
sulting  in  substantially  faster  execution. 

To  illustrate  this,  we  compare  two  different  ways 
to  calculate  the  viscous  flux.  In  the  first  method 
(method  A)  the  various  derivatives  of  the  velocity 
fields  and  the  temperature  are  calculated  consec¬ 
utively,  and  the  viscous  stress  tensor  and  viscous 
heat  flux  are  assembled  and  stored.  Then  the  outer 
derivatives  of  the  viscous  flux  are  calculated,  again 
consecutively.  The  resulting  code  is  very  well  vec- 
torizable  and  consists  of  very  simple  do-loops.  In 
the  second  method  (method  B),  we  use  the  follow¬ 
ing  observation.  In  the  calculation  of  the  deriva¬ 
tives,  some  averages  can  be  used  to  contribute  to 
various  derivatives.  Moreover,  for  all  derivatives, 
the  averaging  weights  in  one  direction  are  equal. 
Therefore  we  calculate  all  inner  derivatives  simul¬ 
taneously,  which  also  has  the  advantage  that  e.g.  a 
vector  ui  needs  to  be  loaded  only  once  for  the  calcu¬ 
lation  of  all  its  derivatives.  An  analogous  fact  holds 
for  the  weights.  Further,  the  derivatives  are  not 
stored,  but  directly  used  to  assemble  the  stress  ten¬ 
sor  and  the  heat  flux.  After  that,  all  outer  deriva¬ 
tives  are  calculated  simultaneously.  This  results  in 
about  30%  less  floating  point  operations,  and  sub¬ 
stantially  less  load  and  store  operations,  resulting 
in  better  memory-performance.  The  drawback  is 
the  occurrence  of  (much)  more  complicated  do-loop 
bodies,  which  puts  a  severe  demand  on  the  compiler 
in  order  to  obtain  suitable  pipelining.  It  appears 
that  on  the  T3d  and  the  Paragon  there  is  hardly 
any  performance  gain,  and  the  performance  is  only 
about  20  %  of  peak.  On  one  R8000  processor  in 
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the  SGI  Power  Challenge  (coupled  to  4  MBytes  of 
cache),  the  CPU-time  of  method  B  is  half  that  of 
method  A,  with  a  performance  of  about  37  %  of 
peak  (110  Mfiops).  More  details  are  to  be  found  in 
ref.  [17]. 
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1.  ABSTRACT 

A  versatile  and  effective  numerical  code  for  direct  and 
large-eddy  simulations  of  compressible  flows  is  described. 
It  is  based  on  robust  explicit  finite-difference  methods 
which  are  second-order  accurate  in  time  and  fourth-order 
accurate  in  space  (Gottlieb  &:  Turkel,  1976).  An  industrial 
application  is  presented,  with  comparison  to  a  more  fun¬ 
damental  case,  tackled  with  spectral  and  compact  schemes, 

2.  INTRODUCTION 

Traditionally,  the  algorithmic  concern  in  CFD  has  been 
the  (fast)  convergence  of  steady  calculations  of  flows  over 
complex  objects.  Unsteadiness  in  the  CFD  context  is  ge¬ 
nerally  associated  to  changes  of  angle  of  attack  or  geo¬ 
metry  (in  the  case  of  store  separation,  for  example),  but 
scarcely  to  the  Tollmien-Schlichting  waves  or  the  hair¬ 
pin  vortices  which  develop  within  the  boundary  layers 
around  these  objects.  Such  events  are  generally  conside¬ 
red  as  “turbulent  fluctuations”  and  are  either  ignored  or 
expected  to  be  accounted  for  through  one-point-closure 
turbulence  models.  This  might  yield  acceptable  predic¬ 
tion  of  the  overall  drag  over  an  aircraft,  but  fails  at  pre¬ 
dicting,  for  example,  the  length  of  the  transitional  region 
in  a  boundary  layer  subjected  to  a  given  level  of  per¬ 
turbations.  The  reason  for  this  is  that  the  physical  me¬ 
chanisms  (of  transition,  turbulence  or  separation)  are  not 
understood  at  a  fundamental  level.  There  is  therefore  a 
great  need  for  numerical  simulations  of  transitional,  tur¬ 
bulent  or  separated  flows.  Considering  the  computational 
resources  currently  available,  two  strategies  are  possible: 

-  Direct  Numerical  Simulations,  in  which  all  turbulent 
scales  are  simulated  explicitely,  in  three  dimensions  of 
space,  down  to  the  Kolmogorov  scale  jy  (or  nearly  so). 
This  implies  high-order  unsteady  schemes,  small  time- 
steps,  very  fine  3D  grids  and,  in  practice,  low  Reynolds 
numbers.  We  would  like  to  stress  that  it  is  not  because 
a  given  scheme  solves  the  complete  Navier-Stokes  equa¬ 
tions  in  three  dimensions  that  its  solutions  automatically 
deserve  the  DNS  label.  If  the  mesh  size  in  a  turbulent 
region  is  larger  than,  say,  IO77,  we  cannot  speak  of  DNS. 
Some  use  the  expression  “pseudo-DNS".  The  problem  in 
this  case  is  that  the  amount  of  dissipation  brought  about 
by  the  grid  being  too  coarse  is  not  controlled. 

‘Institut  National  Polytechnique  de  Grenoble  (INPG),  Uiii- 
versite  Joseph  Fourier  (UJF)  et  Centre  National  de  la  Re¬ 
cherche  Scientifique  (CNRS). 


-  Large-Eddy  Simulations,  a  half-way  house  (Leschziner, 
1995)  between  DNS  and  one-point  closures.  It  consists 
of  simulating  explicitely  and  in  three  dimensions  all  mo¬ 
tion  larger  than  a  certain  cut-off  scale,  accounting  for  the 
contribution  of  the  smaller  scales  through  a  simple  alge- 
brcdc  model.  This  presupposes  that  the  large  scales  are 
more  important  than  the  small  ones,  which  is  certainly 
true  for  turbulence  but  is  more  doubtful  for  combustion, 
for  example.  In  any  case,  from  the  point  of  view  of  al- 
gorithmics,  the  numerical  methods  used  for  LES  are  the 
same  as  for  DNS,  except  that  the  subgrid-scale  turbulence 
model  induces  non-linearities  in  the  dissipative  terms  (in 
addition  to  those  which  come  from  the  dependence  of  mo¬ 
lecular  viscosity  with  respect  to  local  temperature). 

One  controversial  question  (within  the  scope  of  this  pa¬ 
nel)  is  the  role  that  can  play  numerical  dissipation  in  the 
turbulence-modelling  process,  either  through  the  nature 
of  the  scheme  or  the  mesh  size.  In  our  LES,  the  solu¬ 
tions  to  the  equations  solved  do  contain  a  certain  level 
of  kinetic  energy  in  the  smallest  resolved  scales.  This  is 
sometimes  criticized  on  the  ground  that  all  numerical  me¬ 
thods  behave  badly  in  the  small  scales  (even  the  spectral 
methods  blur  the  phase  information  at  the  highest  wave- 
number).  Validation  then  has  to  be  performed  on  physi¬ 
cal  grounds,  through  comparison  with  experimental  data, 
predictions  of  stability  theories  or  numerical  results  ob¬ 
tained  with  different  methods.  Note  that  Leonard  (1974), 
who  coined  the  expression  Large-Eddy  Simulation,  pro¬ 
posed  a  formalism  thanks  to  which  no  energy  would  be 
left  in  the  smallest  resolved  scales.^  To  the  other  extreme, 
some  claim  that  numerical  dissipation  can  play  the  role  of 
a  subgrid-scale  turbulence  model,  and  sometimes  that  of 
the  molecular  viscous  terms  too  (approaches  refered  to  as 
Monotonically  Integrated  LES,  Built-In  LES,  and  so  on) 

Before  giving  our  point  of  view,  we  wOl  briefly  recall  the 
very  classical  numerical  methods  we  use  for  the  simula¬ 
tion  of  compressible  flows  which  do  not  develop  strong 

^The  Navier-Stokes  equations  are  first  convolved  through 
a  continuous  low-pass  filter  which  commutes  with  the  time 
and  space  derivatives.  The  resulting  equations  are  then  closed 
thanks  to  a  subgrid-scale  turbulence  model.  This  closed  system 
of  equations  is  eventually  discretized  onto  a  grid  which  is  finer 
than  the  cut-off  scale  of  the  filter,  so  that  the  result  can  be 
checked  to  be  independent  of  the  mesh  size  (but  of  course  not 
of  the  filter’s  cut-off  scale) . 
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shocks.  The  subgrid-scale  turbulence  models  that  we  cur¬ 
rently  used  are  briefly  presented  in  section  4,  although 
they  will  be  presented  in  more  details  in  Lesieur  h  Me- 
tais  (1996).  The  soundest  of  these  models  is  then  applied 
to  the  LES  of  an  incompressible  mixing  layer  performed 
with  spectral-like  methods  renowned  for  their  low  nume¬ 
rical  dissipation  and  dispersion.  The  same  model  is  finally 
applied  to  a  more  industrial  mixing  layer  simulated  with 
the  code  described  below. 


3.  NUMERICAL  SCHEME 

In  cartesian  co-ordinates,  the  compressible  LES  equations 
can  be  cast,  after  several  crude  simplifications  discussed 
in  Comte  et  al.  (1994),  in  the  conservation-like  form 


dU  dF\  dFi  dFj 
dt  dxi  dx2  dx3 


(1) 


with 


U  =  '^(p,pu,pv,pw,pe)  ,  (2) 

in  which  pe  stands  for  the  resolved  total  energy  defined, 
for  an  ideal  gas  (air),  by 


pe  =  p  Cv  T  F  jp(mi  +  U2  +  “3)  ■  (3) 


The  fluxes  Ft  read,  Vi  £  {1,2,3}, 


pUt 


puiUi  +  pRTSil  —  {p  +  pi't)T,i 

pUtU2  +  pRT5i2  —  (a*  +  pi^t)T,2 

pUiUs  -b  pRTSii  —  (/i  -b  pVt)Ti3 

L  p(e  -b  RT)ui  ~  pTijUj  —ik  +  pCpKt) 


dT 

dxi 


(4) 


with  R  =  ^ 


287.06  Jkg-^K~^  and 


duj  dui 
dxi  dxj 


(5) 


the  deviatoric  part  of  the  resolved  strain-rate  tensor.  Mo¬ 
lecular  viscosity  is  prescribed  through  Sutherland’s  law 


p{T)  =  p(273.15) 


T  1-b  5/273.15 


273.15  1  +  5/r 


VT  >  120 


(6a) 

with  m(273.15)  =  1.711  PI  and  5  =  110.4,  and  its 
extension  to  temperatures  lower  than  120  K: 


p{T)  =  p{l2Q)  T/120  V  T  <  120 


(6b) 


/t(120)  being  given  by  eq.  (6a).  The  molecular  conducti¬ 
vity  k{T)  derives  from  the  constant-Prandtl-number  as¬ 
sumption  Pr  —  Cpp{T)fk(T)  =  0.7.  To  be  closed,  this  set 
of  equations  requires  the  definition  of  Vt  and  kj,  eddy- 
viscosity  and  eddy-diffusivity  coefficients  provided  by  the 
SGS  model  used.  This  will  be  done  in  the  next  section. 


The  adaptation  to  curvilinear  co-ordonates  was  done  by 
David  (1993),  following  Viviand  (1974)  (see  also  the  com¬ 
plete  development  in  Fletcher,  1988),  keeping  the  span- 
wise  co-ordinate  X3  cartesian.  The  chain  rule  gives 

=  +  U) 

dx,  d^i  dxi  d^2  dxi 

for  any  regular  co-ordinate  transformation  (xj ,  X2,  X3)  — *■ 
((,\-,(,2t£,3  =  X3).  Introducing  the  Jacobian 


J  —  det 


dxi 

96 

9x1 

0 


9X2 

96 

9X2 

0 


(8) 


one  can  then  re-write  (1)  as 


dU  dF  dG  dH 
dt  96  96  9x3 

with 


(9) 


U  = 
F  = 

G  = 

H  ^ 


UJJ 


1 

[(1^' 

El  + 

/96> 

F2] 

J 

.  \9xi > 

\9x2> 

1 

El  + 

ff' 

E2I 

J 

\9xi ) 

^9x2  } 

(10a) 

(10b) 

(10c) 

(lOd) 


using  the  chain  rule  (7)  for  the  derivatives  arising  in  the 
fluxes  F,  G  and  H.  Vector  U  is  still  a  function  of  the 
cartesian  co-ordinates  x;  and  time  t.  In  the  limit  of  zero 
viscosity  and  conductivity  (Euler  equations  without  SGS 
model),  the  fluxes  Ft  -  still  defined  by  (3)  -  would  be 
functions  of  U  only. 


For  a  given  2D  geometry  nearly-orthogonal  curvilinear 
grid  ii{x\,X2y,  6(®i>^'2)  is  generated  by  Ryskin  method, 
in  such  a  way  that  each  boundary  of  the  domain  corres¬ 
ponds  either  to  a  line  at  constant  6  or  at  constant  6' 
This  grid  is  then  made  3D  by  spanwise  translation.  The 
system  (9)  is  solved  on  this  grid  by  means  of  a  (2,4)  ex¬ 
tension  of  the  fully-explicit  McCormack  scheme  devised 
by  Gottlieb  and  Turkel  (1976),  in  the  form 


(p) 


At 

Afi 


-b 


A«2 


-b 


A13 


“6  (■^■’+2.2.*  ~  FP^l,J,k)  ] 

-Hg?.. 

(11a) 


i,3  +  7,k  -  +  ] 


i  f/'D)  .i.rjr^. 


1  7(c) 
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“ 
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f(i)  ^  1 
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-  E^^*  'l  1 
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+ 
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I 

- 

-II 

(Mi) 

[Gi,3-l,k 

-47-2,.)] 

+ 

At  1 

Aaia  I 

:  II 

(  Mi) 
[^^J.k  - 

4,4-1 ) 

-II 

[^>.3.k-i 

1 

V 

1 

to 

1 _ 

(lib) 

As  mentioned  in  Thompson  et  al.  (1985)  and  recalled 
in  Fletcher  (1988),  the  metrics  diijdxj  arising  in  the 
fluxes  and  Jacobians  above  have  to  be  discretized  in  such 
a  way  that  unwanted  cross-terms  cancel  out,  otherwise 
the  scheme  is  not  consistent.  First  of  all,  they  have  to  be 
expressed  as  analytic  functions  of  the  metrics  9x7 /9^m  of 
the  inverse  transform,  in  order  to  eliminate  all  derivatives 
with  respect  to  xi  and  X2  in  (9)  and  (10).  These  inverse 
metrics  are  discretized  in  the  following  manner: 
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dxt 


and 


W2) 


-  1/6  +8/6  2:<,  ,  „ 
A6 

in  the  predictor  step  (11a),  and 

7/6  -  8/6  ^  +  1/6 

A6 

I  in  the  corrector  step  (11b) 

(12a) 

-  1/6  +8/6  x<,  ,  -  7/6 

A  ^2 

in  the  predictor  step  (11a),  and 


=  < 


7/6  XI, -  8/6  +  1/6 

A  ^2 

L  in  the  corrector  step  (lib) 

(12b) 

This  is  only  first-order  accurate,  which  is  justified  by  the 
fact  that  the  grids  we  use  are  not  very  distorted,  except 
very  locally.  Therefore  9ir/9^m  remains  almost  everyw¬ 
here  close  to  8tm- 


are  supposed  to  be  known.  For  the  incoming  characteris¬ 
tics  (A)v  <  0),  it  is  necessary  to  prescribe  in  order 

to  pull  out  This  is  done  by  considering  the  nature 

of  the  boundary  condition  (adherence,  free  slip,  perio¬ 
dicity,  prescribed  flow  rate,  non-reflectivity,  inter-block 
matching. . . ).  is  finally  deduced  from  assu¬ 
ming  simply  =  L'%. 


3.  RAPID  OVERVIEW  OF  OUR  SGS  MODELS 
AND  THEIR  RECENT  EVOLUTIONS 

Assuming  spectra  E{k)  oc  for  aU  k,  Metais  k.  Lesieur 
(1992)  proposed  models  defined  in  the  spectral  space,  rea¬ 
ding  in  a  simplified  form 


Ut{k,i)  =  0.31  — -j-y  t/3  —  m  C] 


■3/2 


(16a 


Vt{klkc)  A  / for  m  <  3, 


and 

Vt{k,  <)  =  0 

for  m  >  3, 

(16b) 

with 

Kt{k,t) 

=  Vt{k,t)IPrt 

with  Pft  =  0.6 

(16c) 

In  the  same  way,  the  chain  rule  (7)  has  to  be  applied  to 
eliminate  all  derivatives  with  respect  to  xi  and  X2  from 
the  fluxes  Fi.  This  introduces  metrics  to  be  evaluated 
as  said  above,  together  with  derivatives  of  velocity  and 
temperature  with  respect  to  and  ^2.  Consistency  then 
determines  the  way  these  derivatives,  and  also  5/9^3  = 
djdxz,  should  be  discretized. 


The  boundary  conditions  are  based  on  a  decomposition 
into  characteristics,  in  the  spirit  of  Thompson  (1987,  1990) 
and  Poinsot  and  Lele  (1992).  The  Riemann  invariants 
of  outgoing  characteritics  are  extrapolated,  whereas  the 
incoming  ones  are  either  prescribed  {e.g.  at  the  inflow 
boundary)  or  set  to  zero  [non-reflective  or  open  boun¬ 
dary  condition).  For  example,  going  back  to  cartesian 
co-ordinates  for  the  sake  of  simplicity,  in  the  case  of  a 
boundary  perpendicular  to  the  direction  xi,  the  Euler 
equations  are  recast  in  their  quasi-linear  form 


-^+A- —  =0,  with  V=  (p,pui,pu2,pu3,p) 
ut  dxi 


(13) 

The  matrix  A  is,  as  per  usual,  diagonalized  in  the  form 
A  =  L~^AL.  Assuming  L  to  be  locally  constant  and  in¬ 
troducing  the  vector  W  =  LV,  system  (13)  decouples  into 
5  equations  of  the  form 


dw 

dt 


+  A 


dw 

dxi 


=  0 


(14) 


to  be  solved  at  the  boundary  point  N  through  the  semi- 
imphcit  scheme 


„n+l 


n  \  n  1  1  \  n 


At 


+ 


„"+i 


l,n+l 


Axi 


+ 


A 


n 

N 


‘'jv+l 


Axi 


n+l  ■] 
N 


=  0 


(15) 

For  the  outgoing  characteristics  (A)(r  >  0),  the  values  of 
are  obtained  from  that  of  A^,  Wfif  and  which 


Ck  denotes  Kolmogorov’s  constant,  and  u*  =  1  for  k/k^  <~ 
0.3.  It  rises  for  higher  k/kc,  a  good  fit  of  it  is  (in  the  case 
m  —  5/3  at  least), 

i/t*(fe/fcc)  =  1  +  34.5exp[— 3.03  fcc/fc]  •  (17) 


Until  now,  this  model  has  been  used  with  a  fixed  va¬ 
lue  m  =  5/3,  giving  satisfactory  results,  not  only  in  the 
case  of  isotropic  turbulence  but  also  stratified  and/or  ro¬ 
tating  homogeneous  turbulence  and  temporally-growing 
free  shear  flows  (mixing  layers,  wakes).  For  streamwise- 
and-spanwise-periodic  wall-bounded  flows,  the  easiest  way 
of  accomodating  grid  refinement  at  the  wall  is  to  work 
on  xz  planes,  normal  to  the  wall,  over  which  2D  spec¬ 
tra  E2D{k2D,y,t)  can  be  computed.  Assuming  again  iso¬ 
tropy  with  E{k)  oc  one  can  relate  E2D  to  E  and 

express  eddy  viscosity  and  conductivity  vt{k2D,y,t)  and 
iit{k2D,y,t)  from  (16a).  One  of  us  (E.L.)  did  it  in  the  case 
of  a  plane  turbulent  channel  flow.  With  m  =  5/3,  results 
are  qualitatively  correct,  but  the  wall  shear  stress  Tw  are 
underestimated  by  about  20%  (Fig.  1,  top).  This  is  be¬ 
cause  the  model  is  too  dissipative  near  the  wall,  where 
experimental  measurements  show  spectra  steeper  than 
k~^^^ .  Much  better  statistics  are  obtained  with  a  variable 
m{y,t)  estimated  at  each  timestep  from  E2D(k2D,y,t) 
through  a  least-square  fit  between  k2Dcl'^  f'he 

cut-off  wavenumber  (Fig.  1,  bottom). 

These  results  correspond  to  simulations  at  R  =  Uci^^hfv  = 
5000,  in  which  h  denotes  the  channel’s  half  height  and 
^ciam  centerline  velocity  of  a  laminar  Poiseuille  flow  of 
same  flow  rate  (usual  convention).  This  should  yield  Rt  = 
Urhlv  «  200,  which  is  the  case  for  the  top  plot  of  Fig.  1 
(instead  of  w  180  for  the  bottom  one).  Both  calculations 
are  performed  by  means  of  de-aliased  pseudo-spectral  me¬ 
thods  on  X2  planes  and  6th  order  compact  schemes  in  the 
y  direction  (details  will  be  provided  in  Lamballais  et  al., 
in  preparation).  The  resolution  is  64  x  65  x  32,  for  a  do¬ 
main  of  size  2‘Kh  x  2h  x  vh,  so  that  the  cut-off  wavenum¬ 
bers  along  X  and  2  are  the  same.  Extension  to  non-square 
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Fig.  1  -  Urms  (solid),  Vrms  (dotted)  and  Wrms  (dashed) 
obtained  from  2  LES  differing  only  in  the  determination  of  m: 
set  to  5/3  in  the  top  plot  and  evaluated  from  E2d(^2C!  2/> 
which  turned  the  model  off  in  the  viscous  sublayer  (bottom 
plot).  In  both  plots,  the  symbols  correspond  to  LES  by  Pio- 
melli  (1993)  with  Germano-Lilly’s  dynamic  model,  which  are 
very  close  to  experimental  measurements. 


meshes  is  in  progress;  in  particular,  the  procedure  propo¬ 
sed  in  Scotti  et  al.  (1993)  is  being  tested. 

When  spectral  methods  cannot  be  used,  we  strive  to  de¬ 
termine  eddy-viscosities  out  of  a  measure  of  the  kinetic 
energy  at  the  smallest  resolved  scale  A  =  w/kc.  One  of 
these  local  spectra  is  F2^{x,  t),  the  second-order  structure 
function  of  the  resolved  velocity  field,  evaluated  by  avera¬ 
ging  over  the  closest  neighbours  of  point  z,  either  in  all  3 
directions  of  space  (6-neighbour  formulation)  or  on  planes 
normal  to  the  wall  or  mean  shear  (4-neighbour  formula¬ 
tion)  .  In  the  case  of  infinite  Kolmogorov  spectra,  energy- 
conservation  arguments  (Leslie  Quarini,  1979)  yield  the 
structure-function  model  (Metals  Lesieur,  1992),  defi¬ 
ned  by 

pf^(x,<)  =  0.105  C-"/"  A  ^F2Ax,i)  ,  (18) 

consistent  with  the  spectral  model  (16a). 

This  SF-model  appears  to  be  slightly  less  dissipative  than 
the  Smagorinsky  model  with  the  constant  0.18  given  by 
the  same  assumptions  (infinite  Kolmogorov  cascade,  see 
e.g.  Comte  et  al.,  1994).  As  it  involves  velocity  incre¬ 
ments  instead  of  derivatives,  it  also  has  the  advantage 
of  being  defined  independently  of  the  numerical  scheme 
used.  It  is  nevertheless  not  much  better  for  transition 
than  the  Smagorinsky  model:  low-wavenumber  velocity 
fluctuations  corresponding  to  unstable  modes  yield  Pt’s 
large  enough  to  affect  the  growth  rate  of  weak  unstabili¬ 
ties  like  Tollmien-Schhchting  waves.  So  far,  we  have  found 
two  ways  of  remedying  this; 

-  apply  a  high-pass  filter  onto  the  resolved  velocity  field 
before  computing  its  structure  function.  With  a  triply- 
iterated  second-order  finite-difference  Laplacian  filter  de¬ 
noted  ■' ,  one  finds  E{k)/E{k)  «  40®  {kjkcY  for  all  k, 
almost  independently  of  the  velocity  field  and  resolution. 
With  the  same  arguments  as  for  the  structure-function 
model,  this  yields  the  filtered  structure-function  model, 
defined  by  (Ducros  et  al,  1995) 

ur^ix,t)  =  0.0014  A  ■  (19) 

This  model  enabled  Ducros  to  perform  the  LES  of  a  spa¬ 
tially-growing  boundary  layer  (at  Mach  0.5)  between  Rcx  — 
3.3  10®  and  1.14  10®,  which  widely  encompasses  the  tran¬ 
sition  region,  for  a  cost  of  about  80  hours  of  Cray  2.  With 
the  first  mesh  line  at  j/'*'  «  3  (i.e.  with  just  one  point  in 
the  viscous  sublayer)  and  only  32  points  along  y,  statistics 
were  found  to  be  within  20%  agreement  with  experimen¬ 
tal  data,  as  in  Fig.  1  top. 


-  switch  the  original  structure-function  model  off  when 
the  flow  is  not  three-dimensional  enough  in  the  small 
scales  (David,  1993).  In  practice,  an  average  vorticity  vec¬ 
tor  iS{x,t)  is  computed  over  x  and  its  (4  or  6)  closest 
neighbours.  The  structure-function  model  is  appUed  only 
if  the  magnitude  of  the  angle  a  =  (w(x,  t),  w(x,  t))  exceeds 
a  certain  threshold  ao-  Simulations^  of  incompressible  iso¬ 
tropic  turbulence  at  resolutions  ranging  between  32®  and 
64®  gave  pdf’s  of  |a|  peaking  around  20°.  Having  found 
the  choice  of  ao  not  critical  between  10  and  45°,  we  fi¬ 
nally  retained  ao  =  20°.  The  model’s  constant  was  finally 
set  to  1.56  times  that  of  the  SF  model,  a  least-square  fit 
between  our  test  simulations  yielding  the  same  average 
dissipation  as  the  SF-model.  Dispersion  was  found  small 
enough  to  justify  this  in  first  approximation,  but  a  lot 
of  work  has  yet  to  be  done  to  reduce  the  arbitrariness 
in  this  model.  In  any  case,  the  most  surprising  conclusion 
about  the  filtered  and  selective  structure-function  models 
(hereafter  FSF  and  SSF,  respectively)  is  that  they  can 
be  interchanged  without  much  difference  in  the  results 
(Comte  et  al.,  1994).  This  comes  from  the  fact  that  they 
both  considerably  shrink  the  support  of  vt  (with  respect 
to  that  of  the  original  SF  model),  and  that  both  supports 
are  almost  the  same  (Fig.  2,  middle  and  bottom  plots). 
In  any  case,  they  do  not  react  to  A-vortices,  whereas  the 
SF  model  does  (Fig.  2  top). 


Fig.  2  -  From  top  to  bottom:  isosurfaces  i/t  =  2/3  i/  given 
by  the  SF,  FSF  and  SFS  models,  respectively,  in  the  transi¬ 
tional  portion  of  a  spatially-growing  boundary  layer  at  Mach 
0.5  simulated  with  the  FSF  model  (Ducros  and  Ducros  et  eJ., 
1995,  or  Comte  et  al.,  1994).  The  same  velocity  field  was  used 
for  the  three  plots  (a  priori  test). 

^LES  with  the  original  structure-fimction  model 
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4.  A  REFERENCE  CASE:  THE  INCOMPRES¬ 
SIBLE  MIXING  LAYER  SIMULATED  WITH 
SPECTRAL-LIKE  METHODS 

The  scheme  described  above  is  deliberately  dissipative,  in 
order  to  make  it  robust:  as  an  example,  we  will  show  in 
the  next  section  an  application  of  it  in  the  case  of  a  solid- 
propellant  booster.  Let  us  also  mention  the  flow  over  a 
compression  ramp  at  Mach  2.5  or  the  boundary  layer  at 
Mach  4.5  that  we  briefly  presented  at  the  74th  AGARD 
FDP  (see  Comte  &  David,  1995  and  Ducros  et  al.  1993 
for  more  details).  The  price  to  pay  for  this  robustness  is  a 
certain  numerical  dissipation  (let  alone  the  numerical  dis¬ 
persion),  which  is  difficult  to  measure.  In  the  absence  of 
really-conclusive  analytical  arguments,  one  way  around 
is  perform  comparisons  with  results  obtained  from  nu¬ 
merical  methods  famous  for  their  precision,  such  as  the 
spectral  or  collocation  methods. 

In  Comte  et  al  (1992),  we  presented  a  comparison  between 
two  pseudo-spectral  DNS  of  incompressible  mixing  layers 
at  Reynolds  number^  100  differing  only  by  the  nature  of 
the  initial  perturbations.  In  one  case,  these  were  made  of 
a  mixture  between  2D  fluctuations  (energy  10“^  17^)  and 
3D  fluctuations  of  energy  The  result  was  the 

formation  of  quasi-2D  Kelvin-Helmholtz  vortices  under¬ 
going  pairings  and  stretching  weak  hairpin  vortices  bet¬ 
ween  one  another.  The  spectra  measured  were  in  w  k~^ 
or  k~*,  even  after  the  second  pairing,  and  vorticity  re¬ 
mained  bounded  by  its  maximal  initial  value  w;.  In  the 
other  case  (same  3D  fluctuations  as  before,  but  of  energy 
10~^f7^  and  without  2D  perturbulations)  helical  pairing 
were  observed,  with  more  energy  in  the  small  scales  (spec¬ 
tra  in  A;-^^®),  and  all  components  of  vorticity  reaching 
about  3  iCi. 

These  simulations  were  repeated  in  LES  without  mole¬ 
cular  viscosity  (Silvestrini,  1993).  In  both  case,  we  ob¬ 
served  the  same  large-scale  vortex  pattern  as  in  DNS, 
but  with  more  numerous  and  intense  small-scale  vortices 
(maxwa;  Ri  6wi).  The  difference  in  the  statistics  between 
the  two  cases  was  smaller  than  in  DNS,  although  the  case 
with  3D  perturbations  only  remained  more  turbulent. 

We  now  present  the  same  kind  of  comparison  in  a  spatially- 
growing  configuration,  for  a  velocity  ratio  A  =  (f7i  — 
U2)/iUi+U2)  —  1/2.  Sixth-order  accurate  compact  schemes 
are  used  along  x  with  radiative  outflow  boundary  condi¬ 
tions,  and  pseudo-spectral  methods  on  yz  planes  assu¬ 
ming  periodicity  along  z  (spanwise)  and  free-slip  along  y 
(code  written  by  Gonze,  1993,  and  recently  parallelized 
on  Cray  T3D  by  means  of  slab  <-*•  pencil  transpositions 
under  PVM).  In  all  cases  the  computational  grid  is  uni¬ 
form  with  cubic  meshes.  Fig.  3  corresponds  to  a  DNS  in 
a  domain  of  size  Lx  =  140^;,  Ly  =  286,,  Lz  =  146;  for  a 
resolution  480  x  96  x  48.  The  upstream  Reynolds  num¬ 
ber  is  100,  as  in  Comte  et  al  (1992),  which  corresponds 
approximately  to  the  maximal  value  permitted  at  this  re¬ 
solution.  The  upstream  forcing  is  a  mixture  between  2D 
noise  on  the  plane  x  =  0  and  noise  in  the  transverse  direc¬ 
tion  y  only,  of  respective  energies  e2DU^  and  £idU^  with 
e2D  =  10~^  and  em  =  10~®,  that  is,  3%  in  turbulent 
intensity. 

®based  upon  U  =  (C/j  —  D2)/2  and  6;,  half  the  velocity 
difference  and  the  initial  vorticity  thickness,  respectively 


Fig.  3  -  Snrface  ||u5||  =  1/3  o',-,  in  DNS  a  Re  =  100.  Peak 
vorticity  recorded  here  is  2  w;. 

Repeated  at  zero  molecular  viscosity  with  the  FSF  mo¬ 
del  in  its  6-neighbour  formulation,  transition  is  obtained 
farther  upstream  than  before,  even  with  weaker  forcing: 
Fig.  4  is  obtained  with  £20  =  10~*  and  £id  =  10“^,  in  a 
shorter  domain  than  before  {Lx  —  112  6;,  with  only  384 
points  to  keep  the  meshes  cubic),  other  dimensions  and 
number  of  collocation  points  being  unchanged.  The  thre¬ 
shold  is  twice  as  large  as  before.  Vorticity  magnitude  now 
peaks  at  4  Wi,  which  is  compatible  with  the  high- Reynolds 
number  experiments  of  Huang  &  Ho  (1990).  This  maxi¬ 
mum  is  reached  where  streamwise  vortices  wrap  around 
the  primary  billows. 


Fig.  4  -  Time  evolution  of  the  surface  ||c3||  =  2/3  w,,  in  a 
LES  at  1/  =  0,  with  E2D  =  10“®  and  eic  =  10“^. 
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Fig  .  4  -  (cont’d)  -  Note  the  good  behaviour  of  the  outflow 
boundary  conditions. 

Investigating  sensitivity  to  the  nature  of  the  upstream 
perturbations  would  not  be  pertinent  in  such  a  narrow 
domain^ .  We  thus  doubled  Lz  and  its  corresponding  num¬ 
ber  of  collocation  points.  This  should  not  change  things 
much  in  the  quasi-2D  case  (£20,^10)  =  (10“®,  10“^).  Ho¬ 
wever,  with  (£2D,sid)  =  (10“^,0),  helical  pairings  are 
observed  in  the  wider  domain  (Fig.  5).  The  interested 
readers  are  refered  to  Silvestrini  et  al.  (1995)  for  more 
details 


Fig.  5  -  Surface  ||t3||  =  2/3  uJi,  in  LES  at  jv  =  0,  with 

S2D  —  and  ejxj  =  0. 


^  “Spanwise  correlation  lengths  are  of  the  order  of  3  —  5  5^ 
((5^1  is  the  local  vorticity  thickness).  However,  the  large  vortices 
typically  have  lengths  of  order  20  5,,^  when  the  irregularities 
along  the  span  are  ignored”  (quoted  from  Browand  &  Troutt, 
1985). 


5.  AN  INDUSTRIAL  APPLICATION  :  THE  VOR¬ 
TEX  SHEDDING  INSIDE  A  SOLID  ROCKET 
ENGINE 

We  are  participating  in  an  operation  set  up  by  ONES  and 
ONERA  concerning  the  control  of  the  vibrations  induced 
by  vortex  shedding  within  the  solid-propellant  boosters 
of  the  future  launcher  ARIANE  V.  We  show  below  pre¬ 
liminary  simulations  performed  with  the  code  described 
above,  in  a  simplified  planar  test  case,  with  the  grid  shown 
below  (Fig.  6). 


Fig.  6  -  Grid  of  the  Cl  test  case  (length  L  =  0.47m,  radius 
H  =  0.045m,  resolution  318  X  31  points 

The  step  is  made  of  burning  propellant,  at  a  flame  tem¬ 
perature  of  3387  K  and  a  mass  flow  rate,  normal  to 
the  walls,  of  21.2  kglm^/s.  Pressure  p  =  4.66  bar  is 
prescribed  at  the  upstream  end.  The  outlet  is  a  nozzle 
and  the  outflow  boundary  conditions  are  supersonic.  The 
burnt  gases  are  characterized  by  the  following  parame¬ 
ters:  7  1.14,  R  =  299.53  JlkgjK,  pmoi  =  9.  lO"®  PI 

et  Pr  =  1. 

With  such  values,  2D  simulations  are  not  possible  without 
flux  limiters  or  artificial  viscosities.  With  a  viscosity  8 
times  as  large,  they  become  possible  without  such  limi¬ 
ters,  and  Figure  7  shows  the  resulting  vortices,  in  time 
evolution.  In  such  a  case,  the  code  gives  approximately 
the  same  results  as  the  second-order  Me  Cormack  code 
SIERRA  of  ONERA  (Lupoglazoff  &  Vuillot,  1992). 

In  3D  at  the  true  viscosity  and  with  the  filtered  struc¬ 
ture  function  model  described  above,  the  advantages  of 
the  (2,4)  scheme  become  evident.  The  following  figures 
correspond  to  a  LES  at  a  spanwise  resolution  of  90  points 
equally  spaced  over  the  span  Lz  =  tt  if  w  0.141  m, 
with  periodic  boundary  conditions.  The  initial  condition 
consists  of  the  2D  flow  shown  above,  taken  at  a  given  ins¬ 
tant  of  the  steady  regime,  with  low-amplitude  white  noise 
(of  amplitude  10“*'  the  speed  of  sound  at  the  surface  of 
the  propellant)  on  all  the  components  of  U .  Without  this 
perturbation,  the  flow  would  have  remained  2D,  which 
proves  that  the  code  is  not  “noisy” .  After  having  reached 
the  steady  regime,  which  took  50  hours  of  Cray  90  at 
450  Mflops  (corresponding  to  Sms  of  real  time),  time  se- 


FiG  .  7  -  Contour  maps  of  entropy  at  5  equally  spaced  ins¬ 
tants. 
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Fig.  8  -  Streamwise  vortices  in  a  quasi-industrial 
configuration 

lies  are  recorded  for  5ms.  Figure  8  shows  an  animation 
of  an  isosurface  of  the  magnitude  of  the  vorticity  vec¬ 
tor.  Streamwise  vortices  are  not  only  visible  inbetween 
the  large  Kelvin-Helmholz  billows,  as  in  the  previous  sec¬ 
tion,  but  also  at  the  wall  of  the  nozzle.  These  are  likely 
to  result  from  a  Dean-Gortler  instability  of  the  detached 
boundary  layer,  which  re-attaches  in  the  convergent  part 
of  the  nozzle  (Fig.  9). 

The  statistics  are  in  global  agreement  with  the  experi¬ 
mental  data.  In  particular,  we  found  kinetic  energy  and 
pressure  spectra  which  exhibit  a  fundamental  peak  around 
2500Hz,  and  its  successive  harmonics.  More  precisely.  Fi¬ 
gure  10  shows  a  comparison  between  the  present  LES  and 
the  2D  calculation  just  above.  In  the  3D  case,  the  fun¬ 
damental  frequency  is  lower  (2300Hz  versus  2670)  and 
the  spectra  are  more  developed,  in  particular  in  the  low 
frequency.  This  is  of  crucial  importance  for  the  design 


of  the  anti- vibration  protections  of  the  rocket’s  control 
systems,  and  illustrates  the  importance  of  taking  three- 
dimensionality  into  account,  even  when  the  largest  vor¬ 
tices  are  expected  to  be  two-dimensional. 


Fig  .  9  -  Maps  of  the  entropy  field.  The  top  view  shows  a  cross 
section  of  the  Gortler  vortices,  the  bottom  one  the  streamwise 
vortices  which  connect  the  KH  billows. 
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Fig.  10  -  Tempor2d  kinetic  energy  spectra  recorded  in  the 
middle  of  the  booster.  The  solid  line  corresponds  to  the  LES 
and  the  dashed  line  to  the  2D  DNS. 


6.  CONCLUSION 

A  progress  report  of  our  efforts  towards  the  industritdi- 
zation  of  Large-Eddy  Simulations  has  been  presented.  In 
particular,  it  is  shown  that  such  simple  algorithms  as  5- 
point  extensions  of  fuUy-explicit  McCormack  schemes  can 
be  very  effective,  and  compete  with  spectral  methods  as 
far  as  the  description  of  fine  vortical  structures  is  concer¬ 
ned.  The  importance  of  longitudinal  Gortler-type  vortices 
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lias  been  shown  in  the  case  of  the  flow  within  a  simplified 
booster  of  Ariane  V,  in  addition  to  the  more  dramatic 
case  of  HERMES’  body  flap  which  was  presented  orally. 
The  academic  simulation  of  incompressible  mixing  layers 
has  proved  the  sensitivity  of  LES  to  the  nature  of  the 
disturbances  superimposed  onto  the  basic  flow,  showing 
that  LES  could  be  a  good  tool  for  receptivity  studies  in 
aircraft  and  aerospace  research.  The  next  step  of  our  de¬ 
velopments  in  this  direction  will  deal  with  the  adapta¬ 
tion  of  our  subgrid-scale  turbulence  models  to  complex 
geometries,  following  the  footsteps  of  the  Center  for  Tur¬ 
bulence  Research  in  Stanford  (see  e.g.  Ghosal  &  Moin, 
1995).  Finally,  our  opinion  about  the  role  that  numeri¬ 
cal  dissipation  should  play  in  the  turbulence-modelling 
process  is  the  following;  the  role  of  numerical  dissipation 
should  be  minimized,  unless  we  have  a  way  of  controlling 
it  on  physical  grounds.  Algorithms  with  non-linear  dis¬ 
sipation  are  available:  for  example,  the  PPM  scheme  of 
Collela  &  Woodward  (1984)  is  capable  of  satisfying  the 
second  principle  of  thermodynamics  and  the  positivity  of 
the  thermodynamical  variables  with  an  amount  of  nume¬ 
rical  dissipation  close  to  the  minimum  wherever  it  is  not 
needed.  We  disagree  with  the  claim  that  Euler-PPM  cal¬ 
culations  are  LES  (either  MILES  or  BILES),  because  the 
physics  of  turbulence  has  not  been  incorporated  yet.  Ho¬ 
wever,  we  think  that  this  should  be  possible.  Firstly,  in 
subsonic  regions  at  least,  its  dissipation  should  be  made 
as  little  dependent  on  the  grid  orientation  as  possible. 
Then,  we  should  try  to  force  this  dissipation  to  equal  the 
value  prescribed  by  a  given  subgrid-scale  turbulence  mo¬ 
del.  Thus,  explicit  eddy-viscosity  models  might  become 
redundant  one  day.  However,  we  think  that  the  molecu¬ 
lar  viscous  terms  should  be  kept  in  all  simulations  of  wall- 
bounded  flows. 
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SUMMARY 

In  this  paper,  we  present  recent  developments  in 
the  theory  and  application  of  lattice  Boltzmann 
techniques  and  related  lattice  BGK  models.  Lat¬ 
tice  based  methods  allow  the  study  of  complicated 
systems  with  simple,  efficiently  computable  phys¬ 
ical  models.  Here  we  will  report  some  progress 
with  these  methods  and  give  an  overview  of  their 
basic  ingredients.  Applications  to  various  types  of 
turbulent  flows  are  described. 

1  INTRODUCTION 

The  Lattice  Boltzmann  Equation  (LBE)  is  a  direct 
method  to  solve  the  Navier-Stokes  equations  on  a 
digital  computer.  LBE  is  rooted  in  boolean  lat¬ 
tice  gas  techniques,  a  sort  of  ’’minimal”  molecular 
dynamics  scheme  based  on  the  observation  that 
the  large-scale  dynamics  of  fluid  flow  is  largely  in¬ 
dependent  of  the  details  of  the  underlying  micro¬ 
dynamics.  This  suggests  that  in  order  to  numer¬ 
ically  integrate  the  differential  equations  describ¬ 


ing  the  motion  of  a  fluid,  it  may  prove  convenient 
to  use  a  population  of  microvariables  (’’particles”) 
whose  microdynamics  can  be  freely  adjusted  to 
match  the  Navier-Stokes  equations  on  a  macro¬ 
scopic  scale  [5]. 

The  LBE  method  takes  this  approach  one  step- 
forward  towards  the  macroscopic  world,  from  the 
molecular  to  the  kinetic  level,  by  replacing  the 
boolean  microdynanaical  variables  with  their  cor¬ 
responding  floating  point  expectation  values.  This 
move,  while  preserving  the  locality  in  space  and 
time  of  the  evolution  rules,  which  are  key  to  the 
amenability  to  parallel  computing,  offers  three 
main  advantages;  a  better  amenability  to  present- 
day  computing  architectures  (increasingly  faster 
on  the  floating  point  side);  a  wider  degree  of  lati¬ 
tude  in  choosing  the  details  of  the  evolution  rule; 
a  reduction  of  the  separation  in  scale  between  the 
micro-world  and  the  macro-world  (i.e.  the  aver¬ 
aging  operation  on  a  suitable  region  of  the  micro- 
dynamical  lattice  needed  in  boolan  simulations  to 
remove  statistical  noise  is  no  longer  necessary). 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
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The  Lattice  Boltzmann  equation  was  intro¬ 
duced  in  the  late  80’s  to  cope  with  the  two  major 
drawbacks  of  the  Lattice  Gas  Cellular  Automata 
(LGCA)  technique:  statistical  noise  and  exponen¬ 
tial  complexity  of  the  evolution  rule  with  the  num¬ 
ber  of  degrees  of  freedom  per  lattice  site. 

Ever  since,  the  method  has  gone  from  strength 
to  strength  up  to  the  point  where  it  can  be  put 
on  a  par  with  most  advanced  computational  fluid 
dynamics  (CFD)  techniques  for  a  large  variety 
of  problems,  ranging  from  fully- developed  homo- 
geneus  incompressible  turbulence,  to  multiphase 
flows  in  porous  media. 

Besides  its  amenabihty  to  parallel  computing, 
the  method  is  appreciated  for  the  ease  of  imple¬ 
mentation  of  grossly  irregular  geometries  as  well 
as  for  the  flexibility  of  the  evolution  rule  which 
allows  to  model  complex  physics  by  minor  modi¬ 
fications  of  the  basic  collisional  scheme. 

Despite  these  brilliant  features,  LBE.has  not  yet 
penetrated  the  CFD  engineering  community,  the 
primary  hurdle  being  its  inability  to  deal  with  non- 
uniform,  irregular  mesh  distributions. 

This  problem  has  been  partially  alleviated  in 
the  recent  past  by  importing  finite-volume  tech¬ 
niques  within  LBE  so  as  to  produce  a  finite- volume 
LBE  capable  of  dealing  with  non-uniform  (struc¬ 
tured)  grids. 

This  paper  is  organized  as  follows:  first  we 
present  a  cursory  view  of  the  LGCA  nd  LBE  tech¬ 
niques  respectively.  Subsequently  we  describe  two 
applications  of  LBE  to  the  area  of  fluid  turbu¬ 
lence:  three-dimensional  Rayleigh- Benard  convec¬ 
tion  and  three-dim.ensional  channel  flow  turbu¬ 


lence. 

2  Lattice  Gas  dynamics 

The  development  of  the  lattice  Boltzmann  equa¬ 
tion  (LBE)  is  intimately  related  to  lattice  gas  cel¬ 
lular  automata  (LGCA).  Interest  in  LGCA  origi¬ 
nated  with  the  seminal  paper  of  Frisch,  Hasslacher 
&  Pomeau  (1986)  in  which  it  is  shown  that  a  sim¬ 
ple  automaton  hving  on  a  2D  hexagonal  lattice 
can  provide,  in  the  limit  of  large  scale  motion,  a 
faithful  representation  of  2D  fluid  dynamics  [5]. 
In  contrast  to  the  2D  case,  no  3D  Bravais  lattice 
exists  with  enough  symmetries  to  lead  to  3D  fluid 
dynamics.  A  clever  way  out  of  this  problem  was 
found  by  d’Humieres,  LaUemand  &  Frisch  (1986) 
who  pointed  out  that  a  suitable  four  dimensional 
lattice,  the  face-centered  hypercubic  (FCHC)  lat¬ 
tice,  leads  to  the  proper  symmetries.  To  obtain 
three  (two)  dimensional  hydrodynamics,  periodic 
boundary  conditions  are  imposed  along  the  di¬ 
rection  and  the  flow  is  projected  into  3D  (2D)  [3]. 

The  path  leading  from  LGCA  to  the  Navier- 
Stokes  equations  is  based  on  a  standard  procedure 
of  statistical  mechanics;  (1)  to  get  from  the  parti¬ 
cle  level  to  the  Liouville  level,  an  ergodic  assump¬ 
tion  is  used;  (2)  to  get  from  the  Liouville  level 
to  the  Boltzmann  kinetic  level,  the  assumptions 
that  colhsions  are  instantaneous  and  localized  in 
space  are  involved;  (3)  to  get  from  the  Boltzmann 
level  to  the  Navier-Stokes  continuum  level,  the  as¬ 
sumption  that  the  particle  mean-free-path  is  much 
smaller  than  any  macroscopic  variation  length  is 
made.  The  formal  procedure  to  achieve  the  hy¬ 
drodynamic  description  of  LGCA  is  based  on  a 
multiscale  formalism  using  the  Knudsen  number 
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as  a  small  parameter. 

The  main  advantages  of  LGCA  are  as  follows: 

•  Round-off  error-freedom 

•  Regular  data  structures,  ideal  for  vector  pro¬ 
cessing 

•  Local  interaction  model,  ideal  for  parallel  pro¬ 
cessing 

•  Ease  of  implementation  of  extremely  irregular 
geometries  and  boundary  conditions 

The  price  to  be  paid  for  these  advantages  re¬ 
flects  in  the  following  disadvantages: 

•  Statistical  noise 

•  Exponential  complexity  of  the  collision  oper¬ 
ator  with  increasing  number  of  states/site 

•  Relatively  high- viscosity  and  therefore  low  ef¬ 
fective  Reynolds  numbers 

The  issue  of  statistical  noise  is  a  common  fea¬ 
ture  of  aU  particle  models;  substantial  space/time 
averaging  is  required  to  extract  reasonably  smooth 
hydrodynamic  signals  out  of  the  LGCA  micrody¬ 
namics.  The  issue  of  exponential  complexity  is 
also  typical  of  finite-state  algorithms. 

3  Lattice  Boltzmann  dynamics 

Lattice  Boltzmann  techniques  provide  a  way  out 
of  both  of  these  problems.  With  the  assumption 
of  molecular  chaos,  it  is  possible  to  write  the  fol¬ 
lowing  kinetic  equation: 


Here  Ni{x,t)  is  the  ensemble  averaged  number 
density  of  particles  of  type  i  lying  at  the  lattice 
point  at  x,t  and  propagating  along  the  direction 
identified  by  the  discrete  speed  cl.  Also,  Ai(N) 
is  obtained  from  the  boolean  colhsion  term  by 
simply  replacing  the  stochastic  boolean  popula¬ 
tion  Tit  with  the  ensemble  averaged  population 
Ni-  The  problem  of  noise  in  equation  (1)  is  ab¬ 
sent  because  Ni  is  a  real  variable  and  no  aver¬ 
age  at  all  is  needed  to  recover  the  macroscopic 
fields.  McNamara  &  Zanetti  (1988)  proposed  to 
use  Eq.  (1)  directly  for  hydrodynamic  simula¬ 
tions  with  the  A;  arising  from  the  corresponding 
boolean  models.  In  particular,  they  studied  the 
model  defined  by  the  FHP-III  rules  by  simulating 
the  decay  of  shear  and  sound  waves  of  finite  wave- 
lengthi  [10].  The  comparison  between  the  numer¬ 
ical  values  and  the  Chapman- Enskog  multiscale 
predictions  shows  that  the  hydrodynamic  value  is 
accurate  to  better  than  5%  even  for  a  lattice  as 
small  as  4.  Also  the  behavior  of  sound  waves  is 
satisfactory. 

The  McNamara-Zanetti  approach,  while  fixing 
the  problem  of  statistical  noise,  is  still  left  with 
the  intractable  complexity  of  the  collision  oper¬ 
ator  because  all  b-body  interactions  included  in 
the  boolean  collision  term  are  still  present.  This 
makes  their  approach  unviable  in  more  than  two 
dimensions. 

Higuera  &  Jimenez  (1989)  [6]  noticed  that  the 
Lattice  Boltzmann  equation  can  be  further  sim¬ 
plified  without  losing  any  generality  in  terms  of 
hydrodynamic  fidelity.  The  reason  is  that  macro¬ 
dynamic  equations  in  LGCA  formally  arise  in  the 
double  limit  of  small  Knudsen  numbers  and  small 


Ni{x  -I-  c;,t  -f- 1)  -  Ni{x,t)  =  Ai{N)  i=  l,b  (1) 
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Mach  numbers.  It  is  then  convenient  to  consider 
the  expansion  of  the  collision  term  on  the  right 
side  of  (1)  corresponding  to  these  conditions.  To 
do  this,  let  us  write  iV,  as 

W,  =  W'V.u)  +  iV”»(Vp,Vu),  (2) 

and  further  decompose  as 

^  ^(0)  ^  ^{1)  ^  ^(2)  ^3^ 

where  the  upper  index  refers  to  the  order  in  the 
Mach  number  M.  This  expansion  permits  to  ex¬ 
press  the  collision  operator  in  terms  of  a  simple 
2-body  scattering  matrix 

A,{N)  ~  -  N^).  (4) 

where  the  derivatives  being  calcu¬ 

lated  in  the  state  of  zero  velocity  Ni  =  d  =  p/b. 
The  element  A^^  controls  the  scattering  rate  be¬ 
tween  directions  i  and  j,  and  is  the  local 
maxwellian  equilibrium  expanded  to  second  order 
in  the  local  flow  field. 

Despite  its  apparent  hnearity,  the  expression  (4) 
accounts  for  second  order  terms  in  the  expansion 
of  the  colhsion  operator. 

The  Higuera- Jimenez  LBE  marks  an  important 
breakthrough  as  it  opens  the  way  to  practical 
three-dimensional  simulations  of  fluid  flows;  as 
a  matter  of  fact  it  turns  a  2^  complex  problem 
(where  6  is  the  number  of  bits  at  each  lattice  site) 
into  a  complex  one!  The  quasilinear  LBE  in¬ 
troduced  by  Higuera  &  Jimenez  is  still  in  a  one- 
to-one  correspondence  with  its  underlying  LGCA 
microdynamics.  This  sets  a  relatively  strict  upper 


bound  to  the  Reynolds  number  attainable  since 
the  LBE  viscosity  is  exactly  the  same  that  results 
from  the  corresponding  LGCA.  Given  the  fact  that 
one  is  ultimately  interested  just  in  the  large-scale, 
hydrodynamic  features  of  the  flow,  at  this  point, 
this  appears  as  an  unnecessary  restriction. 

One  is  therefore  naturally  led  to  regard  the 
LBE  as  a  self-standing  model  of  the  Navier-Stokes 
equations,  regardless  of  any  underlying  LGCA  dy¬ 
namics  (Higuera,  Succi,  Benzi  (1989))  [7]. 

The  starting  point  in  the  definition  of  the  ’self¬ 
standing’  lattice  Boltzmann  equation  is  again  the 
hnearized  kinetic  equation  (4).  The  change  in  per¬ 
spective  is  however  substantial:  the  choice  of  the 
quantities  Aij  and  in  (4)  is  no  longer  dictated 
by  an  underlying  boolean  microdynamics  but  is 
rather  adjusted  to  the  macroscopic  equations  to  be 
reproduced.  With  this  broader  view,  the  attention 
is  shifted  on  the  scattering  matrix  and  notably  on 
its  leading  non-zero  eigenvalue,  the  one  control¬ 
ling  the  viscosity  of  the  LBE  flow.  This  eigenvalue 
can  be  tuned  at  the  outset  so  as  to  achieve  the  de¬ 
sired  flow  viscosity  in  a  fairly  handy  fashion. 

4  Lattice  BGK  models 

In  a  similar  vein,  Bhatnagar,  Gross  &  Krook 
(1954)  used  a  relaxation  approximation  to  model 
the  effect  of  complicated  collisions  [2].  The  ba¬ 
sic  formulation  of  lattice  BGK  models  can  then 
be  described  as  a  simplified  Boltzmann  equation 
starting  from  time  evolution  equation  as 

N,{x  +  Ci,t  +  1)  =  Ni{x,t)  +uj[Nf{x,t)  -  N,{x,t)] 

(5) 
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where  a;  is  a  relaxation  parameter  (collision  fre¬ 
quency  in  kinetic  theory).  The  key  point  here 
is  the  choice  of  the  equilibrium  state  so  that 
it  leads  to  the  exact  Navier-Stokes  equation  at 
hydrodynamic  space  and  time  scales.  The  right 
choice  is 


\TC  j.  n  / 

~  P^p[^  “I”  ^2  2^4 


where  is  the  speed  of  sound,  and  tp  are  weights 
depending  on  the  square  amplitude  of  the  velocity 
p  (since  particles  are  either  at  rest  or  move  one  grid 
site  per  timestep,  p  is  an  index  from  0-2  in  2D,  0- 
3  in  3D,  which  labels  particles  at  rest,  in  motion 
along  or  in  motion  diagonal  to  the  grid).  Require¬ 
ments  of  isotropy  and  Galilean  invariance  impose 
constraints  on  the  weights  tp  which  are  model  de¬ 
pendent  (Qian  and  Orszag  (1993)  [11]). 


A  two-scale  analysis  in  time  leads  to  the  effec¬ 
tive  hydrodynamic  equations  at  second  order  of 
the  Knudsen  number  (the  ratio  of  mean  free  path 
to  characteristic  length): 


dtp  +  daipua)  =  0 


dtipua)  +  dnipUaUf})  =  -da{clp) 
+  l'd0[p{dpUoc  +  daUfi)] 


(6) 


where  c*  is  the  sound  speed  and  p  the  shear  vis¬ 
cosity  is  given  by 


Also,  the  incorporation  of  an  eddy  viscosity  model 
is  quite  straightforwardly  accomplished  through 
the  introduction  of  a  space-  and  time-dependent 
relaxation  parameter  cu.  The  Smagorinsky  for¬ 
mula  for  the  eddy  viscosity,  for  example,  becomes 
simply 


T  ‘^Ci\S\  +  cl 
2c2 


(8) 


where  |5|  is  the  amplitude  of  the  strain  tensor 
and  Cl  is  a  constant. 

A  nice  property  of  LBE  is  that  the  strain  tensor 
So,0  is  available  locally  as  an  appropriate  linear 
combination  of  the  particle  populations  Ni  Other 
eddy  viscosity  models  may  be  implemented  in  a 
similar  way.  The  inclusion  of  standard  wall  condi¬ 
tions  for  the  eddy  viscosity  is  equally  straightfor¬ 
ward. 

From  a  numerical  point  of  view  the  LBE  is  basi¬ 
cally  an  explicit  finite-difference  scheme  working 
at  the  edge  of  the  Courant-Friedrichs-Lewy  con¬ 
dition  cAt  =  Ax  and  bearing  a  significant  resem¬ 
blance  with  the  Dufort-Frankel  scheme.  It  is  char¬ 
acterized  by  a  favorable  computation/calculation 
ratio  which  is  key  to  its  amenability  to  parallel 
implementations  across  virtually  the  whole  spec¬ 
trum  of  present-day  parallel  computers.  This  fa¬ 
vorable  ratio  is  achieved  at  the  expense  of  some 
extra-memory  and  CPU  overhead  (the  number  of 
discrete  populations  exceeds  the  number  of  signif¬ 
icant  hydrodynamic  fields)  as  compared  to  stan¬ 
dard  explicit  CFD  schemes. 


5  Applications 

Many  applications  of  lattice  BGK  methods  to  di¬ 
verse  fluid  flows  have  and  are  being  made;  for  a 
recent  review  see  (Qian,  Succi  and  Orszag,  1995 
[12]). 

Here  we  shall  cursorily  review  two  recent  appli¬ 
cations:  three-dimensional  Rayleigh- Benard  ther¬ 
mal  convection  and  three-dimensional  channel 
flow  turbulence. 
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5.1  Three-dimensional  thermal  convec¬ 
tion 

Recently,  the  LBE  formalism  has  been  extended  in 
such  a  way  as  to  handle  thermal  convection  by  in¬ 
cluding  the  dynamics  of  a  temperature  field  within 
the  fluid  flow,  Massaioli,  Succi,  Benzi,  (1993)  [8]. 

The  thermal  LBE  code  has  been  extensively  ex¬ 
ploited  to  gain  new  insights  into  a  number  of  issues 
related  to  thermal  turbulence,  such  as  the  shape 
of  the  probability  distribution  function  of  veloc¬ 
ity  and  temperature  fluctuations  and  the  related 
implications  on  the  scaling  properties  of  thermal 
turbulence. 

Perhaps,  the  most  valuable  outcome  of  these 
simulations  is  a  clue  on  the  nature  of  turbulent 
flows  which  goes  now  by  the  name  of  ’’extended 
Self  Similarity”  (ESS).  ESS  represents  a  kind 
of  generalized  scale  invariance  which  apparently 
holds  also  in  the  limit  of  low  Reynolds  numbers, 
i.e  when  dissipation  still  plays  a  non-negligible  role 
on  the  flow  dynamics  (Benzi,  Ciliberto,  Massaioli, 
Tripiccione  and  Succi,  1993)  [4]. 

The  basic  statement  of  ESS  is  that  scaling  prop¬ 
erties  of  a  turbulent  flow  are  most  conveniently 
highlighted  by  inspecting  the  structure  functions 
one  versus  another  rather  than  as  a  function  of  the 
space  separation  r,  as  suggested  by  the  common 
practice. 

In  particular  the  scaling  exponents  Up  can  be 
derived  by  measuring  the  p  —  th  order  distribution 
function  Sp{r)  in  terms  of  S3{r)  according  to  the 
following  relation: 

(7) 


where  5p(r)  is  defined  as: 

Sp{r)  =<  \u{x  -f  r)  -  u(a;)|^  > 

angular  brackets  denoting  ensemble-averaging. 

According  to  the  standard  K41  Kolmogorov  the¬ 
ory,  in  the  scaling  regime  (Reynolds  number  going 
to  infinity)  the  3rd  order  structure  function  53(r) 
becomes  a  linear  function  of  the  space  separation 
r,  which  is  why  scaling  is  commonly  probed  by  log¬ 
plotting  the  structure  functions  Sp  versus  r.  The 
problem  with  this  procedure  is  that  the  Reynolds 
numbers  achievable  by  direct  simulation  of  the 
Navier-Stokes  equations  on  present-day  comput¬ 
ers  are  not  high  enough  to  attain  a  fully- developed 
scaling  regime.  The  result  is  that  a  clearcut  mea¬ 
surement  of  the  scaling  exponents  is  hampered  by 
statistical  inaccuracies. 

The  point  of  ESS  is  that  the  eq.  (7)  holds  even 
for  moderately  low  Reynolds  number  for  which 
the  Kolmogorov  relation  53(r)  ~  r  does  not  ap¬ 
ply,  whence  the  denomination  of  ’’extended”  self¬ 
similarity. 

The  practical  implication  is  that  scaling  ex¬ 
ponents  can  then  be  reliably  measured  out 
of  moderate-Reynolds  number  simulations,  well 
within  reach  of  present-day  computational  ca¬ 
pabilities  (128^  grid  points  being  perfectly  ade¬ 
quate). 

The  validity  of  the  ESS  assumption  is  currently 
being  explored  for  a  variety  of  different  flows,  in¬ 
cluding  Rayleigh-Benard  turbulence,  magnetohy¬ 
drodynamics  and  others. 

In  the  specific  instance  of  Rayleigh-Benard  con¬ 
vection,  ESS  has  permitted  to  gather  a  wide  body 


5p(>-)  =  {Si(r)T’ 
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of  numerical  evidence  in  favor  of  ’buoyancy- driven’ 
Bolgiano  scaling  ,  i.e.  energy  spectrum  scaling  like 
E[k)  ~  as  opposed  to  ’non-thermal’  Kol¬ 

mogorov  decay  E{k)  ~  k~^^^  [8]. 

5.2  Three-dimensional  turbulent  chan¬ 
nel  flow 

As  mentioned  in  the  introduction,  the  LBE  has 
been  recently  merged  with  the  finite  volume 
method  to  produce  a  variant  of  LBE  (FVLBE 
hereafter)  able  to  deal  with  non-uniform  grids. 

The  idea  is  to  take  the  differential  form  of 
LB  dynamics  and  apply  a  finite-volume  procedure 
based  upon  integration  of  eq.  (1)  on  each  cell  of  a 
control  grid  of  (almost)  arbitrary  shape. 

By  straigthforward  use  of  Gauss  theorem,  we 
obtain 

dtFi,c  +  =  Ai,c  (8) 

where  is  the  mean  population  of  the  macro¬ 
cell  c,  the  corresponding  flux  across  the 

boundaries  of  c,  and  is  the  rate  of  change 
if  Ft_c  due  to  collisions  occourring  within  the  cell 
c.  Clearly,  the  actual  computaion  of  surface  fluxes 
involve  an  interpolation  technique.  For  the  case 
in  point,  piece- wise  linear  interpolation  is  used  so 
that  locality  is  preserved  to  a  good  extent  in  the 
numerical  scheme. 

This  scheme  has  been  validated  for  the  case  of 
three  dimensional  turbulent  channel  flow  simula¬ 
tion  on  a  moderate  resolution  grid  (64  X  64  X  128) 
spanning  a  physical  channel  of  heigth  H  =  192, 
length  Lx  =  960,  and  width  Ly  =  512,  i.e.  pretty 
close  to  the  one  examined  by  Moin  and  coworkers 


[9], 

The  idea  is  to  reproduce  the  well-known  loga¬ 
rithmic  law-of-the  wall  of  the  mean  flow  profile; 

•  Ux{z)  =  2^;  0  <  2  <  <5' 

•  Ux{z)  =  ^log  -f  u,d;  z>S 

where  x  =  0-4  is  the  Von  Karman  constant, 
u*  a  typical  turbulent  velocity,  d  is  a  calibration 
constant,  and  S  =  is  the  thickness  of  the 

“viscous  sublayer”. 

The  average  velocity  profiles  drawn  from  the  nu¬ 
merical  simulation  are  checked  against  the  above 
expressions  to  produce  best  fit  values  of  u^,v^,dn 
where  the  superscript  n  denotes  ‘numerical  simu¬ 
lation’  (see  Figure  1). 

The  actual  values  of  are  derived  from 

the  slope  of  the  linear  plot  Ux  vs  z  {vl/u),  the 
slope  of  the  plot  Ux  vs  log[z)  (  u*/x  )  and  the 
value  of  log{ux)  at  z  =  1  (u,/x  •  log{v»fu)  +  du*). 

The  main  outcome  of  these  simulations  is  that 
turbulence  is  supported  during  the  entire  life  span 
of  the  simulation,  that  is  2.4  X  10®  time  steps,  cor¬ 
responding  to  about  90  longitudinal  transit  times. 
This  is  due  to  the  fact  that  the  channel  is  long 
enough  to  support  streamwise  rolls  feeding  cross 
channel  turbulence. 

Data  samples  have  been  collected  every  53  steps 
in  the  interval  [100,000,240,000],  thus  yielding 
about  2600  profiles  for  statistical  data  analysis. 
The  numerical  best-fit  values  deduced  from  the 
simulation  are  as  follows:  i/”  =  0.013  ±  0.002  , 
<  =  0.013  ±  0.001  ,  (T  =  6.5  ±  0.7. 

First,  we  remark  that  the  measured  viscosity 
is  about  twice  higher  than  the  theoretical  input 
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value  1/  =  0.05.  This  is  attributed  to  localized 
peaks  of  numerical  viscosity  occurring  there  where 
the  lattice  pitch  is  changing  due  to  the  mesh  non- 
uniformity  (a  sharp  1-2-1  mesh  distribution  along 
z  has  been  adopted). 

Second,  we  note  that  d”  is  within  the  error 
bars  provided  by  the  literature,  although  some¬ 
what  on  the  upper  side.  Finally,  since  turbulence 
is  sustained  for  a  significant  time-span,  wall  stress- 
tensor  statistics  is  also  available  for  the  purpose  of 
internal  consistency  checks  This  yields: 

V*  =  y/<  UxU^  >|z=0  ~  0.012  (9) 

in  a  pretty  good  match  with  the  values  deduced 
by  the  velocity  profiles  (Figure  2). 

To  sum  up,  these  moderate  resolution  runs  sug¬ 
gest  that  the  FVLBE  scheme  provides  results  well 
within  the  error  bars  of  current  CFD  at  quite  a 
comparable  computational  cost  (10  /rs  per  grid- 
point  per  step  on  a  IBM  RS  6000  mod.  580  work¬ 
station). 

Further  work  is  needed  to  judge  upon  its  com¬ 
petitiveness  on  a  more  quantitative  ground. 

6  Conclusions 

In  summary.  Lattice  Boltzmann  methods  pro¬ 
vide  a  complementary  numerical  approach  to  tra¬ 
ditional  numerical  methods  for  complex  nonhn- 
ear  systems.  Benchmark  problems  have  validated 
the  approach  as  a  flexible  and  efficient  numerical 
method.  Fruitful  applications  are  being  made  to 
multiphase  flow  simulations,  subgrid  modeling  of 
turbulence  and  non-uniform  lattice  applications. 
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Figure  !  ,  Mean  velocity  pr<'file  for  a  tiirbuU'iit  cliaiiiiel  flow  using  moderate 
resolution,  with  a  1  —  2  —  1  lattice;  the  dotted  lines  represent  the  maximum 
and  the  minimum  value  of  the  theoretical  velocity  prc^file,  c,r>mputed  with  the 
viscosity  derived  by  the  numerical  experiment:  Reyn<'>lds  number  R  ^  3000, 
viscosity  v  —  0.013  ±  0.002  ,  typical  velocity  =  0.014  ±  0.001  . 
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1  As  a  consistency  check,  we  comi)ute  v,  —  [r/ py ^^{^=0  srid  we  obtain 
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INTRODUCTION 

It  is  well  known  that  two  types  of  transition  are  possible  in  the  boundary  layer:  natu¬ 
ral  and  ’bypass’  transition  (see  review  of  A.M.  Savill  [14]).  First  type  of  transition  is 
observed  in  the  artificial  case  of  low  free  stream  turbulence,  ’bypass’  transition  usually 
takes  place  in  real  technical  equipment:  aircraft,  turbine  engine  etc.  Theoretical  inves¬ 
tigations  of  both  type  of  transition  excite  such  difficult  questions  as  problem  of  model 
construction,  problems  of  accurate  and  effective  space  and  time  resolution. 

Known  models  can  be  divided  onto  two  parts:  semi-empirical  models  (for  instance, 
Savill-Launder-Younis  model  [15]  (1995))  and  models  based  on  reduction  of  initial-value 
and  boundary  problem  for  Navier-Stokes  equations  (adding  of  artificial  term  of  mass 
force  adopted  by  Laurien  E.  k  Kleiser  L.  [11]  (1989),  Parabolised  Stability  Equations 
model,  which  was  designed  by  Bertolotty  F.P.,  Herbert  Th.  k  Spalart  P.R.  [1]  (1992), 
’fringe’  model  suggested  by  P.R.  Spalart  [17]  (1993)).  We  describe  now  one  model  of 
second  type,  namely,  the  Slow  and  Fast  disturbances  interaction  Model  (SFM)  designed 
by  V.S. Chelyshkov  [6]  (1993).  The  model  is  based  on  the  assumption  that  slow  and 
fast  disturbances  interaction  in  longitudinal  coordinate  is  possible  in  such  weakly  non¬ 
parallel  flows  as  non-gradient  and  gradient  boundary  layers,  jets  and  wakes.  This  idea 
was  developed  last  years  in  the  papers  [4,  5,  6,  8]  (see  also  review  by  V.T.  Grinchenko 
k  V.S.  Chelyshkov  [9]).  The  approach  is  valid  for  3-D  flows,  but  we  shall  regard  for 
simplicity  2-D  boundary  layer  near  semi-infinite  flat  plate. 

THE  MODEL  OF  DISTURBANCE  INTERACTION 

It  is  known  that  two  scales  of  flow  in  longitudinal  coordinate  (slow  and  fast)  are  possible 
near  a  flat  plate.  Blasius  flow  is  slow  (weekly  non-parallel)  flow.  Two  dimensional 
perturbances  of  Blasius  flow  are  divided  into  two  types:  slow  undamping  perturbances, 
which  control  the  boundary  layer  thickness  [12]  and  fast  non-stationary  perturbances 
[16].  Both  types  of  perturbances  must  depend  on  slow  longitudinal  coordinate,  but 
experimental  and  theoretical  investigations  show,  that  we  can  neglect  this  dependence  for 
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second  ones  [16].  Fast  disturbances  self-interaction  results  in  fast  and  slow  disturbances. 
The  last  ones  make  the  contribution  to  weekly  non-parallel  flow  compound.  So  the  way  of 
SFM  construction  is  following.  Let  I  be  the  distance  from  leading  edge  to  a  fixed  point  on 
a  flat  plate,  Uoo  -  a  velocity  of  run  flow,  p  -  the  fluid  density,  u  -  the  kinematic  viscosity 
coefficient.  Cartesian  coordinates  [x\y')  are  introduced  to  describe  2D  non- stationary 
flow,  which  depends  on  time  t'.  These  coordinates  beginning  coincides  with  leading 
edge,  and  x'-axis  directs  along  the  plate.  The  velocity  vector  components  are  designated 
as  u',v'  in  this  coordinate  system,  p'  is  the  pressure.  We  choose  the  non-dimensional 
variables  using  the  formulae: 

x'  =  lXo,  =  t'  =  lU-^T,  u=UooU,  v'  =  U^\v, 

V  =  P^loV.  =  1-72078766,  A  = 

Then  the  velocity  vector  field  and  the  pressure 

F  =  {u,v,p}{Xo,y,T) 

is  described  by  Navier-Stokes  and  continuity  equations 

dxu  +  udxoU  +  vdyU  =  —dxoP  +  ~^{dyy  +  ^^dxoXo)u, 

\2 

A^(5j'U  -\~  '^dxa'v  T  vdyV^  =  ^yP  T  2^^yy  ^  9xqXo^^ i  (1) 

K> 

dxoU  +  dyV  =  0. 

Equations  (1)  need  suitable  initial- value  and  boundary  conditions  in  the  flow  domain, 
which  is  not  defined  for  the  present.  Boundary  conditions 

U  |j/=0==  ^  |i/=0—  0,  U  T  ^  |i/-t-oo—  0  (2) 

are  set  on  a  flat  plate  and  far  from  the  wall.  Poisson  equation  for  pressure 

-  {dyy  -f  X'^dxoXo)?  =  2X\{dxouf  +  dyudxov)  (3) 

is  the  result  of  equations  (1). 

Parameter  A  is  small  far  from  the  leading  edge  and  Cartesian  coordinates  [Xo^y)  is 
stretched  out  of  transverse  coordinate.  When  A-^'O  Blasius  solution 

F  =  F^,  F^  =  {u^,v^,p^}{Xo,y), 

satisfies  Prandtl  equations  and  the  boundary  conditions 

|^^0=  |j;=o==  0,  1,  \y^oo<  OO.  (4) 

Thus  physical  condition  of  damping  v  when  y-^oo  is  substituted  for  limitness  condition. 
We  define 

Xo  =  1  +  X,  =  Aa:,  T  =  At,  Re  =  k^X.  (5) 
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The  disturbed  flow  field  is  considered  in  half-band 

T>  =  {— tt/o  <  a;  <  tt/o,  0  <  y  <  oo}, 

where  a  ~  0{1)  is  a  parameter.  Now  we  define  the  solution  F  of  the  problem  under 
consideration  in  the  following  way 


F  =  F®  -f  F'^  +  F'^ 

F^  =  {u^n^p"}(Xo,y,t),  (6) 

F-^  =  {u^  /X,p^}{x,y,t),  =  {u^,v^} 

Here  F^  and  F-^  are  the  vector  fields  describing  slow  and  fast  disturbances.  We  introduce 
the  a:- average  in  V 

wfa 

F  =  [  Fdx 

27r  J 

—•K/a 

and  shall  suppose  that  F"^  =  0.  Substituting  (6)  to  initial  problem  (1)  -  (3)  and  throwing 
away,  as  for  laminar  flow  description,  addends  of  the  order  of  O(A^),  a  system  of  equations 
and  boundary  conditions  are  obtained.  We  add  to  nonlinear  equations  and  subtract  from 
them  a:-average  of  the  convective  addends,  which  contain  fast  disturbances.  Now  we  can 
separate  in  the  convenient  way  all  addends  of  each  equation  into  two  parts.  Then  we 
break  these  two  parts  of  addends  and  equate  to  zero  each  of  them.  The  problem  is 
obtained: 

dtu^  +  +  {u^  +  u^)dxoU^  +  dyU^v^  -f  (u®  +  v^)dyU^)- 

Ke 

-  ^dyyU^  Nu{u^,u^,u^)  =  0,  (7) 


dxoU^  +  dyV^  =  0, 


y—o O5  w  |y_>oo O5  ^  ly— ^oo'‘^  CX3, 

(8) 

-  dyyP^  =  Np{u^,  U-^,  U^), 

(9) 

P  |j/— >oo —  ^yP  ly— >^00  O5 

(10) 

dtu^  +  Nu{u^,  u-^)  +  -  A^u(u^,  u^,  u7)  =  0, 


-f  dyV^  =  0, 


(11) 
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-  =  A^p(u^,  u-^,  u^)  -  Np{uB^  u^,  u^), 

1j/=0=  |y=0=  O5 

ly— foo~  iy— *■00  ^yP  ly-^oo  (^^) 

Here  2 

A^^(u^,  u^,  u-^)  =  {u^  +  u^)dxu^  + 

j^—[v^  +  v^)dyu^  +  {dyU^  +  dyU^)v^  +  u^d^u^  + 

Re 

A^p(u®,  u^,  u-^)  =  “*"  5;i-oM^)5a:W^+  (14) 

^-2{dyU^  +  +  2((a^M^)^  +  dyu^d^rv^). 


In  our  opinion  the  SFM  (7)  -  (14)  describes  near-wall  flow  in  both  cases  of  low  and 
high  free  stream  turbulence.  The  equations  have  no  the  second  ^-derivative  of  v^.  That 
is  why  the  physical  condition  of  damping  i^xr  from  the  wall  is  replaced  here,^  like  in 
(4),  by  limitness  condition,  and  the  solution  of  problem  (7)  -  (14)  will  not  be  uniformly 
applicable.  Relationship 

dtv^  -  0(A) 

is  the  condition  of  the  model  validity.  This  relationship  cannot  be  established  a  priory, 
but  seems  to  be  acceptable  due  to  week  dependence  on  time  of  F‘^.  The  natural  conditions 
of  disturbances  damping  far  from  the  wall,  like  in  (2),  have  to  be  carried  out  for  fast 
part”  of  flow  field.  Substitution  of  one  of  Navier-Stokes  equations  for  Poisson  equation 
allows  us  to  construct  time  discretization  schemes  without  the  need  for  fractional  step. 
This  way  also  gives  the  possibility  to  extend  the  solution  algorithm  to  3D-problem. 

The  values  of  velocity  vector  components  are  unknown  at  the  boundaries  orthogonal 
to  the  wall.  We  cannot  introduce  periodicity  conditions  at  these  boundaries  because  the 
flow  is  weakly  non-parallel.  Following  the  idea  of  boundary  layer  coherent  structures  [2], 
we  shall  suppose  that  the  flow  is  close  to  periodic  in  longitudinal  direction  and 

(F^  -b  FO  U=-./a=  (F^  +  FO  +0(1/Re) 

To  vanish  slight  arbitrariness  in  these  boundary  conditions  we  shall  construct  the  solu¬ 
tions,  depending  on  both  longitudinal  coordinates  in  some  special  way. 
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SPACE  AND  TIME  DISCRETIZATION 

Direct  methods  are  applied  for  discretization  of  the  problem  (7)  — (14).  The  known  forms 
of  perturbances  dependence  on  longitudinal  coordinate  are  used  for  trial  functions  choice: 

j=0  j=o 

/=/(?/,  7  =  (15) 

{u^,  v^,  p^}{x,  y,t)=  ^k,  Pk}{y^  t)  exp{iakx).  (16) 

\k\<.K,k^0 

In  (15)  i^Q  =  0,  and  power  indexes  Uj  {j  >  0)  are  selected  on  the  basis  of  vorticity 
exponential  damping  far  from  the  wall,  such  a.s  =  1,  1^2  —  1.887,  —  2.867,  = 

3.8, _ Now  we  can  expand  first  two  terms  in  (6)  into  Taylor  series  in  X,  substitute  the 

result  to  (7)  — (14)  and  throw  away  addends  of  the  order  of  o(Ax)  in  (11).  Using  (15), 
(16)  and  expanding  variable  \x  into  Fourier  series  in  (11)  we  can  separate  longitudinal 
coordinate  by  projection  equations  under  consideration  into  two  systems  of  test  functions: 

=  0, 1, . . . ,  and  exp(zamx),  m  ^  0. 

Sequences  Xq  and  X^  are  not  orthogonal  to  each  other  in  the  interval  of  their 
changing.  This  leads  to  numerical  difficulties  for  slow  part  of  solution,  when  N  is  large, 
due  to  necessity  to  inverse  matrix  of  Hilbert’s  type. 

The  next  stage  of  approximation  is  the  solution  representation  in  coordinate  orthog¬ 
onal  to  the  wall  in  the  interval  [0,oo).  The  asymptotics  of  the  velocity  and  the  pressure 
field  coefficients  of  fast  disturbances  far  from  the  wall  have  the  form  exp{—aky)  for 
near-wall  modes,  where  A:  >  0  is  Fourier  harmonic  number.  Therefore  in  the  problem 
class  at  issue  for  solution  approximation  in  coordinate  y  it  is  convenient  to  use  exponen¬ 
tial  polynomials  (EP)  orthogonal  on  semi-axis  by  weight  of  unity.  Some  computational 
and/or  algorithmic  advantage  can  present  EP  Sn,k{y)  =  —  2exp{—y)) 

obtained  by  orthogonalization  of  exponential  sequence  in  inverse  order,  starting  from 
some  number  n  [3].  Here  are  Jacobi  polynomials.  These  polynomials  are  used  for 

solution  representation  in  coordinate  orthogonal  to  wall,  and  sequence  £n,k  is  filled  up  by 
unity  for  approximation  of  vertical  velocity  vector  component  of  slow  disturbances.  Final 
projection  into  phase  space  is  carried  out  by  Bubnov- Galerkin  method,  that  allows  one 
to  use  the  ’boundary  functions’  [13]  to  satisfy  the  boundary  conditions  at  the  wall.  For 
precise  numerical  integration  Gauss  quadrature  formulae  derived  in  terms  of  properties 
of  EP  is  applied,  so  3n/2  points  are  used  in  the  algorithm. 

The  described  way  of  spatial  approximation  results  in  triangular  matrix  as  discrete 
analog  of  Laplacian  that  allows  one  to  employ  explicit  schemes  in  time.  So  variant  of 
Runge-Kutta  method  was  adopted  for  time  resolution. 

The  following  stage  of  discretization  is  stated  in  details  in  [4].  Collocation  method  is 
more  preferable  for  3-D  flow  modelling.  Variant  of  collocation  method,  namely,  combined 
direct  method  is  suggested  in  [7]  for  near-wall  flow  simulation. 


NUMERICAL  RESULTS  OF  NATURAL  TRANSITION  SIMULATION 


Level  of  flow  vorticity  far  from  the  wall  y  =  0  and  inflow  boundary  conditions  define  the 
influence  of  free  stream  turbulence  in  R-domain.  Really,  recent  experiments  [19]  show, 
that  high  free  stream  vorticity  before  a  flat  plate  changes  Blasius  profile  and  excites  fast 
oscillations  near  the  nose  part  of  the  plate.  So  both  time-undamping  slow  perturbances 
and  fast  disturbances  are  developed  due  to  changing  of  inflow  conditions  at  the  boundary 
of  R- domain. 

VVe  shall  consider  here  more  simple  case  of  exponentially  small  free  stream  turbulence. 
In  this  case  we  shall  suppose  that  influence  of  time-undamping  slow  perturbances  is 
small  for  natural  transition  and  slow  part  of  disturbed  flow  is  one-dimensional  in  the 
boundary  layer  coordinates  (y,  A^o),  so  N  =  0  in  (15).  We  omit  the  terms  of  the  order  of 
0(X)  in  equations  (11),  so  periodicity  conditions  are  valid  at  the  orthogonal  to  the  wall 
boundaries  for  fast  disturbances.  Such  simplifications  lead  to  initial-value  and  boundary 
problem,  which  has  no  functional  arbitrariness  in  space.  We  also  shall  suppose  that 
modes  of  continuous  spectrum  are  not  excited  and  our  algorithm  is  constructed  in  such 
a  way  that  disturbed  flow  damps  far  from  the  wall  in  accordance  with  asimptotics  of 
near- wall  modes. 

•  Physical  parameters  Re  =  520  and  a  =  0.308  set  2-D  flow  domain.  Simulation 
parameters  are  K  =  7,  n  =  32.  The  parameter  values  yield  dynamic  system,  which 
has  409  degrees  of  freedom.  The  simulation  was  performed  for  interval  0  <  i  <  20000. 
Initial  values  of  amplitudes  were  determined  from  the  solution  of  Orr  —  Sommerfleld 
eigenvalue  problem.  The  values  correspond  to  initiation  of  Tollmein  -  Schlichting  wave 
with  phase  velocity  equal  to  0.396.  The  disturbance  development  picture  is  divided  into 
the  two  parts.  At  first  (Z  is  less  than  --  10000)  the  travelling  wave  regime  with  increasing 
amplitude  arises.  When  the  oscillation  energy  reaches  some  value  the  single-wave  regime 
is  reconstructed  and  the  regime  close  to  oscillations  with  many  frequencies  is  excited. 
Let  us  is  disturbed  skin  friction  and 
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For  steady  flow  regime  the  amplitudes  at  x  -  0  and  N 


Power  spectrum  of  is  shown  in  Fig.  2, 
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harmonics,  which  have  oscillations  with  frequencies  according  to  the  picks.  One  can  see 
that  apart  from  main  travelling  wave,  which  has  phase  velocity  equal  to  0.566,  other 
oscillations  exist.  Among  these  oscillations  the  largest  energy  has  the  oscillation  with 
convective  speed  equal  to  0,809  17oo!  which  is  excited  by  the  second  x-Fouiier  haimonic. 
It  is  of  interest  that  each  space  scale  has  own  number  of  oscillation  frequencies.  It  cdso 
appears,  that  near-wall  travelling  wave  phase  velocity  practically  coincides  with  near¬ 
wall  propagation  velocity  of  perturbations  in  channel  [10].  In  contrast  wiUr  the  result 
of  work  [13]  we  have  found  that  phase  velocities  of  both  pressure  and  friction  equal  to 
each  other  near  the  wall.  The  skin  friction  x-Fourier  harmonics  Liak)  decay  rate  is 
shown  in  Fig.  4  for  simulation  time  t  ~  20000.  One  can  see  that  the  decay  is  enough 


rapid,  so  the  seventh  harmonic  is  about  200  times  less  than  the  first. 

Direct  numerical  simulation  experience  leads  to  the  conclusion  that  non-dimensional 
time,  which  is  necessary  to  obtain  fully  developed  flow,  usually  is  very  long.  Curiously, 
the  according  physical  time  is  enough  short.  Let  us  r  is  the  dimensional  time,  so 

T  =  i/Re  I  Ul^t. 

If  Uoo  =  and  the  fluid  is  water,  then  the  physical  simulation  time  is  10,4  s  in 

examining  case.  This  time  greatly  differs  from  the  computer  time,  which  is  necessary 
for  2-D. modelling.  Simulation  of  3-D  boundary  layer  is  more  difficult  problem  and  the 
statistically  steady  solution  have  not  been  obtained  up  to  now  in  this  case  (see,  for  in¬ 
stance,  [18]). 

CONCLUSION 

1.  The  new  mathematical  model  based  on  Navier-Stokes  equations  has  been  devel¬ 
oped.  The  model  can  be  effective  for  quantitative  description  of  a  class  of  weakly  non- 
homogeneous  flows.  The  model  was  tested  by  consideration  the  flow  stability  problem 
near  a  flat  plate. 
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2.  To  verify  our  model  approach  and  discretization  algorithms  we  have  carried  out 
long-time  DNS  of  disturbed  Blasius  flow  for  various  but  moderate  numbers  of  degrees  of 
freedom. 

3.  We  have  found  that  balance  between  the  numbers  of  taken  in  orthogonal  directions 
functions  have  to  be  observed.  If  K  is  the  number  of  taken  Fourier  harmonics  in  lon¬ 
gitudinal  direction  and  n  is  the  number  of  taken  exponential  polynomials  in  orthogonal 
to  the  wall  direction  then  n  =  n[K)  for  successful  execution  of  our  algorithms.  It  is 
essential  to  notice  that  increasing  of  K  leads  to  n-increasing. 

4.  Our  experience  of  near-wall  flow  modelling  leads  to  the  conclusion,  that  numerical 
solution  breakdown,  the  so-called  ‘turbulence  arising’  does  not  correspond  to  the  real 
physical  phenomena  in  the  boundary  layer. 

5.  In  our  opinion  we  have  found  statistically  steady  state  of  flow  near  a  flat  plate.  This 
flow  is  time-organized  structure,  which  has  the  background  of  quasi-periodic  oscillations 
with  incommensurable  frequencies. 
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1  SUMMARY 

Structured  sub-block  refiuemeut  is  a  means  to  refine  a 
mesh  at  certain  areas  within  the  flow  region,  in  order 
to  enhance  the  local  resolution  of  the  flow  equations  or 
flow  solution  without  going  to  costly  global  mesh  refine¬ 
ment.  By  the  use  of  appropriate  sensors,  the  regions  of 
refinement  can  be  defined  during  the  running  flow  solving 
process  so  that  the  adaptation  becomes  automatic.  And 
the  use  of  structured  refinement,  i.e.  refinement  by  block¬ 
like  areas,  does  only  require  minor  changes  to  the  overall 
multi-grid  iteration  scheme.  Strategies  for  the  selection 
of  sub-blocks  and  first  results  for  2D  and  3D  Euler-  and 
Navier-Stokes  test  cases  are  given.  The  drawbacks  and 
the  potential  of  the  method  are  discussed. 

2  LIST  OF  SYMBOLS 

A  continuous  Navier-Stokes  operator 

u  continuous  solution  to  Navier-Stokes  system 

/  right  hand  side  of  Navier-Stokes  system 

I,  II  transfer  operators  between  meshes 

T  local  truncation  error 

J,  J,  K  indices  of  points  in  computational  space 
Subscripts  and  superscripts 

h  discrete  form  referring  to  mesh  h 
2h  discrete  form  referring  to  mesh  2h 
from  mesh  h  to  mesh  2h  or 
mesh  h  relative  to  mesh  2h 
~  approximation  to  ■ 

3  INTRODUCTION 

The  process  of  discretization  of  the  flow  equations  causes 
differences  between  the  continuous  solution  of  the  Navier- 
Stokes  system  of  differential  equations  and  the  solution 
of  the  system  put  onto  the  computer.  This  error  is  called 
local  truncation  error,  and  it  plays  the  major  role  con¬ 
cerning  solution  deficiencies.  Discretization  errors,  their 
magnitude  and  distribution  about  the  flow  region,  are  in¬ 
fluenced  by  geometrical  mesh  properties  as  well  as  prop¬ 
erties  of  the  flow  solution.  Both  types  have  to  be  encoun¬ 
tered  when  selecting  appropriate  sensors  that  shall  drive 
adaptative  flow  solving  algorithms. 

All  types  or  combinations  of  sensors  result  in  single  point- 
wise  quantities  which  have  to  be  scanned.  A  certain 


threshold  determines  whether  a  point  or  local  region  is  al¬ 
ready  o.k.  with  respect  to  the  expected  error  or  whether 
it  is  a  candidate  for  mesh  adaptation. 

In  principle,  mesh  adaptation  distinguishes  between 
mesh  enrichment  and  mesh  movement.  Mesh  movement 
tries  to  improve  the  solution  by  shifting  the  existing  mesh 
points  to  more  appropriate  positions.  Mesh  enrichment 
means  to  refine  the  mesh  which  leads  to  an  increased 
number  of  mesh  points.  Eventually,  a  coarsening  of  the 
mesh  is  also  possible  in  regions  where  the  quality  mea¬ 
sure  is  already  good.  Both  approaches  have  their  specific 
problems,  best  may  be  to  combine  them. 

Within  this  paper,  we  try  to  describe  adaptive  mesh  en¬ 
richment  strategies  within  a  structured  multi-block  con¬ 
text.  The  principle  structure  of  the  flow  solver  shall  not 
be  affected  by  the  local  refinement.  This  means  that  re¬ 
finement  zones  have  to  be  of  structured  type,  i.e.  they 
must  be  regular  mesh  blocks.  We  use  a  concept  of  sub¬ 
blocks  which  has  been  developed  within  the  Euromesh 
project  of  BRITE/EURAM  and  the  ECARP  project  of 
IMT  Aera  3  of  CEC  research.  This  concept  allows  to 
treat  sub-blocks  as  additional  levels  of  refinement  in  the 
usual  multi-grid  sequence  of  the  MELINA  flow  solver 
(Fig.  1)  [RilBec92]. 

Structured  mesh  enrichment  of  the  form  described  above 
has  its  drawbacks  when  tracing  features  of  the  flow  which 
run  diagonally  through  the  mesh.  Unless  there  are  lim¬ 
iters  for  the  size  of  the  sub-blocks,  quite  large  refinement 
zones  must  be  expected.  So,  we  are  aware  that  this  spe¬ 
cial  type  of  mesh  enrichment  will  not  be  the  ultimate  but 
a  first  and  practicable  solution  to  mesh  adaptation. 

4  SUB-BLOCK  APPROACH 

The  idea  of  structured  sub-block  refinement  is  to  simply 
patch  locally  refined  mesh  blocks  onto  the  existing  mesh 
and  connect  the  additional  fine  sub-blocks  with  the  origi¬ 
nal  mesh  via  a  multigrid  technicpie.  Thereby,  a  sub-block 
has  to  lie  completely  in  a  grid  block  of  the  existing  mesh, 
which  includes  touching  the  block  boundary.  But  a  grid 
block  may  have  various  sub-blocks  and  a  sub-block  may 
have  several  sub-blocks  itself  (see  Fig.  2). 

The  sub-block  approach  can  be  viewed  as  a  compromize 
between  structured  an  unstructured  meshes,  combining 
the  benefit  of  high  computat  ional  efficiency  on  structured 
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meshes  and  of  clustering  grid  points  in  a  ’’quasi  unstruc¬ 
tured”  way  by  scattering  sub-blocks  and  even  further  re¬ 
fined  blocks  in  regions  of  discretization  errors.  It  is  en¬ 
visaged  to  use  this  method  for  solution  adaptive  mesh 
refinement  if  the  regions  of  sub-block  refinement  are  de¬ 
termined  automatically  during  the  iteration  by  suitable 
sensor  functions. 

4.1  Surface  and  Interior  Point  Definition 

When  a  sub- block  is  created,  between  each  two  mesh 
points  on  a  coarse  grid  line  an  intermediate  fine  grid  point 
has  to  be  introduced. 

On  any  component’s  surface,  this  new  point  has  to  lie  on 
the  surface.  This  means  that  the  new  point  has  to  be  con¬ 
structed  using  the  original  surface  definition.  However, 
this  causes  severe  problems  if  the  surfaces  are  defined 
by  external  CAD  means,  for  example.  Therefore,  most 
often  special  interpolation  procedures  are  used  which  cre¬ 
ate  local  surface  approximations  from  the  existing  coarse 
mesh  points.  The  single  approaches  differ  by  the  quality 
of  surface  representation.  For  aerodynamics,  the  criteria 
of  absolut  distances  to  the  real  CAD  surface  and  wavy- 
ness  of  the  interpolated  surface  play  the  major  role.  For 
the  moment,  we  don’t  want  to  stress  this  problem:  we 
simply  use  Coons’  local  patches. 

The  definition  of  interior  fine  mesh  points  is  not  that  con¬ 
strained.  As  long  as  Euler  meshes  are  considered,  those 
mesh  points  can  be  constructed  using  simple  trilinear  in¬ 
terpolation  of  the  coarse  cells  in  the  field. 

For  the  very  dense  Navier-Stokes  meshes,  in  the  vicinity 
of  a  curved  surface  intersections  of  field  mesh  lines  with 
the  true  boundary  are  very  likely  to  occur  with  trilinear 
interpolation.  Therefore  the  filling  algorithm  has  been 
changed  to  the  use  of  Coons’  representation  for  each  mesh 
plane  parallel  to  the  surface,  not  only  the  surface  planes. 
This  guarantees  smooth  behaviour  of  the  mesh  in  the 
whole  sub-block,  especially  in  the  boundary  layer  mesh. 
Additionally,  it  avoids  any  intersection  of  mesh  lines  or 
planes  with  fixed  surfaces.  Because  this  approach  is  that 


robust,  fast  and  easy,  we  adopted  it  also  for  the  Euler 
meshes. 

4.2  Communication  between  Sub-blocks  and 
Coarse  Blocks 

In  general,  sub-blocks  cover  only  part  of  the  computa¬ 
tional  domain.  Boundary  conditions  on  their  outer  block 
boundaries  must  be  defined  such  that  there  is  no  algo¬ 
rithmic  influence  on  the  overall  flow  solution.  Within  the 
multigrid  context,  flow  variables  are  interpolated  from 
the  coarse  mesh.  If  the  sub- block  boundary  touches 
the  coarse  block  boundary,  the  same  boundary  condi¬ 
tion  is  applied.  Wall,  symmetry  or  similar  conditions  are 
thus  treated  correctly.  Special  things  have  to  be  done 
if  the  sub-block  boundary  lies  inside  the  coarse  block. 
Boundary  values  of  the  sub-block  cannot  be  set  as  fixed 
Dirichlet  type  conditions  because  this  conflicts  with  the 
mixed  type  nature  of  the  flow  equations.  The  interpo¬ 
lated  values  serve  only  as  initial  guess  and  the  values 
are  updated  using  the  original  flow  equations  themselves 
on  the  fine  mesh.  Thereforf  at  least  one  row  of  guard 
cells  has  to  be  created  around  the  sub-block  which  con¬ 
tains  the  flux  integral  information  needed  for  the  applica¬ 
tion  of  the  cell  vertex  discretization  at  the  real  sub-block 
boundary.  This  procedure  is  quite  the  same  as  is  applied 
between  two  adjacent  blocks  of  the  original  non-refined 
mesh.  In  addition  to  this,  co  iservativeness  has  to  be  en¬ 
sured  across  the  sub-block  boundaries.  In  our  code,  this 
is  achieved  by  replacing  the  flux  integrals  along  coarse 
cell  faces  at  the  sub-block  boundary  location:  the  coarse 
mesh  integral  is  replaced  by  the  sum  of  the  participating 
fine  mesh  integrals. 

This  type  of  communication  between  sub-blocks  and 
blocks  is  managed  with  the  help  of  the  face  group  con¬ 
cept.  Each  block  has  at  least  one  face  group.  This  group 
of  six  faces  consists  of  the  minimum/maximum  index 
planes  (boundaries  of  the  computational  domain)  of  the 
block.  For  each  sub-block  that  is  added  to  the  coarse 
block,  a  new.  face  group  is  deiined.  It  contains  those 
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Figure  2:  Sub-blocks  within  a  mesh  block  -  schematic 
view. 


segments  of  coarse  mesh  planes  that  coincide  with  the 
block  boundaries  of  the  respective  local  sub-block.  So 
this  face  group  is  the  hull  of  the  sub-block  inside  the 
coarse  block.  The  respective  topological  description  data 
are  used  to  drive  the  communication  of  flow  variables  and 
other  relevant  data  within  the  flow  solver. 

If  two  sub-blocks  within  a  coarse  block  or  across  the 
boundaries  of  two  coarse  blocks  are  adjacent  to  each 
other,  then  communication  should  be  allowed  directly 
between  those  sub-blocks.  The  simplest  way  is  to  trans¬ 
fer  data  from  a  sub-block  to  the  respective  face  group 
of  the  coarse  block  and  from  there  to  the  neighbouring 
sub-block.  However,  this  path  of  communication  con¬ 
tains  interpolation  errors  and  should  thus  be  replaced  by 
the  immediate  transfer  of  data,  from  one  sub-block  to  its 
neighbour.  Within  the  topological  description  data,  this 
problem  could  be  easil,v  solved  because  sub-blocks  are 
treated  in  the  same  way  as  usual  blocks. 

If  a  new  sub- block  is  constructed,  the  topological  data  are 
updated  automatically.  Boundary  conditions  and  con¬ 
nections  to  adjacent  sub- blocks  are  detected  and  included 
in  the  description.  This  makes  the  fully  adaptive  incor¬ 
poration  of  new  sub-blocks  into  an  existing  multi-block 
mesh  relativeljf  easy  once  the  respective  coarse  mesh  face 
group  boundaries  are  known.  One  major  technical  dif¬ 
ficulty  is  the  generality  of  sub-block  to  sub-block  con¬ 
nections.  Up  to  now,  two  sub-blocks  of  a  coarse  block 
are  only  allowed  to  touch  each  other  if  it  is  with  one  full 
face.  Touching  only  with  part  of  a  face  would  require 
new  segmentation  of  the  respective  faces  and  can  easily 
result  in  very  complex  face  segmentations.  On  the  other 
hand,  the  above  restriction  hinders  an  effective  treatment 
of  diagonal  refinement.  For  the  moment  the  drawback  of 
full  face  touching  has  to  be  overcome  by  resizing  respec¬ 
tive  sub-blocks.  However,  part-of-face  touching  is  under 
development. 

Several  topologically  different  sub-block  configurations 
have  been  tested.  Because  the  sensor  evaluator  may  sug¬ 
gest  quite  general  addition  of  sub-blocks,  it  might  be  nec¬ 
essary  to  have  such  arrangements  run  quite  robust. 


For  example,  if  we  have  a  four  block  finest  mesh,  in  a  first 
adaptation  step  sub-blocks  might  be  suggested  only  for 
three  blocks.  This  leads  to  different  finest  levels  on  dif¬ 
ferent  blocks  within  the  multi-grid  cycles.  Additionally, 
consecutive  sub-blocking  during  subsequent  adaptation 
loops  has  to  be  allowed  which  means  that  sub-sub-sub- 
... blocks  can  occur.  Such  and  similar  conditions  have 
been  investigated  concerning  the  convergence  behaviour 
and  the  quality  of  solution,  especially  at  the  junction  of 
refined  and  non-refined  regions.  No  specific  problem  has 
been  detected  with  the  Euler  flow  solver.  However,  with 
the  Navier-Stokes  solver  it  turned  out  that  the  imple¬ 
mentation  of  the  turbulence  model  has  a  great  impact. 
In  practice,  the  Baldwin-Lomax  model  used  requires  wall 
distance  information.  This  information  is  very  difficult  to 
obtain  in  general  multi-block  meshes  if  it  is  not  evaluated 
in  a  preprocessing  step. 

5  SENSOR  EVALUATION 

The  evaluation  of  any  sensor  field  always  means  scanning 
the  field  for  a  pre-specified  range  of  values  that  are  con¬ 
sidered  to  indicate  deficiencies  of  solution  accuracy.  We 
can  distinguish  between  sensors  that  depend  on  the  flow 
solution  itself  and  sensors  that  are  defined  by  purely  geo¬ 
metrical  quantities.  Mathematical  analysis  of  discretiza¬ 
tion  leads  to  certain  guidelines  concerning  the  mesh.  One 
of  those  rules  is  that  one  should  use  smooth  and  orthog¬ 
onal  meshes.  Measures  of  those  quantities  can  thus  be 
used  to  determine  ’’bad”  regions  within  an  existing  grid. 
On  the  other  hand,  the  flow  itself  shall  drive  the  mesh 
in  order  to  properly  resolve  special  features  like  shocks, 
stagnation  regions,  boundary  layers  or  shear  layers.  The 
analysis  of  respective  sensors  leads  to  suggestions  for  en¬ 
hanced  grid  density  regions. 

5.1  Flow  Independeirt  Sensors 

Within  the  BRITE/EURAM  Euromesh  project,  a  palette 
of  geometrical  quality  measures  has  been  developed.  Now 
we  use  these  measures  for  a  priori  qualification  of  meshes. 
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Figure  3:  Local  truncation  error  estimate  for  a 
NACA0012  Euler  case  -  computational  space  and  phys¬ 
ical  space  -  left:  T{continuity  equation),  right:  T{2nd 
momentum  equation). 


mainly.  Within  the  DA  mesh  generation  system  IN¬ 
GRID,  the  following  3D  measures  are  implemented:  or¬ 
thogonality,  skewness,  aspect  ratio  and  expansion  rate 
for  3  index  directions.  Neither  of  them  leads  to  an  ab¬ 
solute  criteria  for  mesh  qualit}'.  Orthogonality,  for  ex¬ 
ample,  cannot  be  achieved  in  the  whole  mesh  if  there 
appear  angles  other  than  90  degrees  on  the  surface.  Re- 
spectation  of  those  angles  is  necessary  for  high  quality 
surface  representation,  but  clearly  violates  the  principle 
of  orthogonality.  Similar  statements  can  be  made  for  the 
other  quantities.  Nevertheless,  those  quantities  should 
be  taken  into  account  when  creating  base  meshes. 

5.2  Flow  Dependent  Sensors 

The  first  and  most  likely  reason  for  deficiencies  in  solution 
accuracy  is  a  too  high  level  of  local  truncation  error.  This 
error  describes  in  principle  how  good  the  nonlinear  op¬ 
erators  of  the  Navier-Stokes  equations  are  approximated 
by  the  discrete  differentiation  and  iirtegration  rules  on  a 
specific  mesh.  It  must  be  reminded  that  there  is  no  lo¬ 
cality  in  the  relation  between  this  error  and  the  global 
truncation  error  of  the  solution  itself,  i.e.  the  solution 
error  can  occur  at  quite  different  locations  than  the  local 
truncation  error  [Klim95].  This  is  especially  due  to  the 
transport  character  of  the  equations. 

Truncation  error  estimates  can  be  extracted  directly 
from  the  multi-grid  cycles:  Specific  differences  between 
medium  and  coarse  mesh  residuals  in  a  three  level  com¬ 
putation  yield  an  estimate  of  the  local  truncation  error 
r  [Bra77].  This  estimate  for  all  equations  of  the  Euler  or 
Navier-Stokes  system  is  used  to  define  the  locally  refined 
(fine)  mesh  level. 

In  detail,  if  the  continuous  equation 


Au  =  f  (1) 

is  discretised  on  a  mesh  with  typical  mesh  size  h 

AhUh  =  fh  (2) 

where  Uh  is  the  discrete  solution,  then  the  local  trunca¬ 
tion  error  Th  is  defined  by 

Th  =  AhU  —  Au  (3) 

If  we  further  add  and  subtract  the  discrete  operator 
applied  to  an  approximation  «/,of  the  discrete  solution 
Uh, 

Ajitth  —  fh  Ah,Uh  “h  AhUfi,  (4) 

and  represent  this  equation  on  the  next  coarser  grid  with 
mesh  size  2h,  then  we  end  up  with  the  multigrid  coarse 
grid  correction  equation 

A2hU2h  =  IIh^(fh  —  AhUh)  +  A^hlfl^Uh,  (5) 

which  contains  the  local  truncation  error  estimate  on 
mesh  2h  relative  to  mesh  h: 

rt  =  A^hlV^Uh  -  IlV^AhUh.  (6) 

Under  the  assumptions  Ah  ~  A  and  Uh  ~  u  this  yields 

sa  A2hU  -  Au,  (7) 

which  is  the  local  truncation  error  T2h  on  mesh  2h.  Fig. 
3  gives  an  impression  on  the  distribution  and  the  levels 
of  local  truncation  error  for  a  2D  transonic  test  case. 
During  the  studies  it  has  been  found  very  useful  to  have 
presentations  of  the  estimate,  in  physical  as  well  as  in 
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Figure  4:  Suggestions  for  new  sub-blocks  -  left:  RA- 
DIUS=5,  right:  RADIUS=2. 


computational  domain.  Interestingly,  the  errors  for  the 
single  equations  seem  to  be  complementary  to  each  other. 
Near  the  nose  of  the  airfoil,  T(coniinuity  —  equation) 
suggests  refinement  in  other  parts  of  the  flow  field  than 
r{momenium  —  equations),  for  example.  For  our  inve.s- 
tigations,  the  Ll-norm  over  all  equations  was  taken  to 
drive  the  refinement.  More  detailed  studies  can  be  found 
in  [Lau95]. 

5.3  Sub-block  Definition 

In  the  context  of  structured  sub-block  refinement,  a  strat¬ 
egy  has  to  be  developed  by  which  the  location  and  e.x- 
tension  of  local  sub-blocks  can  be  determined.  The  eval¬ 
uation  of  any  sensor  defines  a  set  of ’’bad”  points  or  cells 
that  appear  as  clouds  in  the  index  space  of  each  struc¬ 
tured  block.  On  the  one  hand,  the  sub-blocks  have  to 
cover  those  clouds.  On  the  other  hand,  the  size  of  the 
sub-blocks  corresponds  to  the  numerical  effort  and  thus 
has  to  be  as  small  as  possible.  The  strategy  to  define 
reasonable  sub-blocks  is  as  follows: 

•  Find  a  first  bad  point  (I,J,K). 

•  Set  IMIN=IMAX=I-index  of  bad  point;  same  with 
J  and  K  indices. 

•  Trace  the  surroundings 

(IMIN-RADIUS,  IMAX-t-RADIUS;  ...)  of  the  cur¬ 
rent  (IMIN,IMAX;  JMIN,JMAX;  KMIN,KMAX) 
area  for  more  bad  points. 

•  If  any  more  bad  point  has  been  identified,  en¬ 
large  the  respective  MIN/MAX  values  and  restart 
search. 

•  If  no  more  bad  point  can  be  found,  define  the  sub¬ 
block  from  the  current  MIN/MAX  values. 


The  user-given  tolerance  value  RADIUS  has  a  strong  in¬ 
fluence  on  the  size  of  the  sub-blocks.  It  also  defines  the 
minimum  distance  between  two  sub-blocks  within  one 
block.  In  order  to  avoid  that  very  large  sub-blocks  are 
suggested  which  more  look  like  a  global  mesh  refinement 
the  maximum  size  of  sub-blocks  must  be  bounded.  Also, 
it  may  happen  that  many  small  sub-blocks  are  created  if 
any  singular  bad  point  is  taken  into  account.  This  can  be 
hindered  by  a  minimum  bound  for  the  number  of  points 
within  a  sub-block. 

Fig.  4  shows  the  index  cube  representation  of  suggestions 
for  sub-blocks  within  coarse  block.  In  the  first  case,  a 
RADIUS  of  5  was  chosen  whereas  in  the  second  case  the 
RADIUS  value  was  2,  resulting  in  one  more  sub-block  of 
smaller  size. 

5.4  Adaptation  Cycle 

Mesh  enrichment  via  sub-blocks  should  run  automati¬ 
cally  within  the  flow  solution  process.  However,  for  the 
development  of  such  a  method  it  is  reasonable  to  com¬ 
bine  the  single  elements  of  code  in  a  more  loose  form. 
The  adaptation  cycle  has  been  splitted  into  4  steps: 

•  Start  calculation  on  a  reasonably  fine  mesh  and 
store  the  results  (mesh,  flow  solution,  local  trun¬ 
cation  error), 

•  Run  the  sub-block  suggestion  code  and  store 
MIN/MAX  indices  for  each  coarse  block, 

•  Generate  the  enriched  mesh  which  contains  the 
previous  mesh  and  the  new  sub-blocks, 

•  Restart  the  flow  solver  using  interpolated  values  as 
starting  solution  for  the  new  sub-blocks. 
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Figure  5:  Pressure  distribution  RAE2822  for  meshes 
(Nl),..,(N4)  -  comparison  with  experiment. 

This  cycle  can  be  run  until  the  maximum  number  of  re¬ 
finement  levels  has  been  reached.  It  is  assumed  that  each 
time  only  the  relatively  finest  level  can  be  refined. 

6  NUMERICAL  RESULTS 

The  sub-block  concept  described  above  has  been  imple¬ 
mented  in  3D.  However,  for  cost  reasons  and  for  first 
validation  purposes  it  is  reasonable  to  begin  with  2D 
Euler  and  Navier-Stokes  flows.  The  basis  of  3D  Euler 
and  Navier-Stokes  investigations  on  local  refinement  and 
adaptation  was  a  wing/body  combination. 

6.1  2D  Test  Cases 

Local  mesh  refinement  has  been  tested  in  2D,  first: 
RAE2822  test  case  9  with  a.  free  stream  Mach  number 
of  0.734,  angle  of  attack  of  2.54  degrees  and  Reynolds 
number  of  6.5  million.  The  Navier-Stokes  calculation 
should  serve  as  a  preliminary  test  to  show  the  effect  and 
effectiveness  of  local  refinement.  Refinement  was  done 
by  hand  using  sub-blocks  which  covered  the  whole  up¬ 
per  surface  including  the  supersonic  region  and  extended 
slightly  on  the  lower  surface  near  the  nose  of  the  air¬ 
foil.  Fig.  5  shows  the  resulting  pressure  distributions  for 
different  meshes  and  the  experimental  values.  We  have 
chosen  four  different  meshes  as  there  were 

(Nl)  standard  fine  C-mesh  with  241  x  77  mesh  points, 
about  30  points  normal  to  the  wall  in  the  boundary 
layer, 

(N2)  mesh  (Nl)  coarsened  once  by  omitting  every  sec¬ 
ond  point,  with  121  x  39  mesh  points, 

(N3)  mesh  (N2)  with  sub-blocks  and 

(N4)  mesh  (Nl)  with  sub-blocks. 


The  RAE2822  test  case  9  has  often  been  used  for  valida¬ 
tion  purposes.  Always  problems  with  the  suction  peak 
on  the  upper  wing  nose  have  been  reported  as  it  is  the 
case  with  the  present  results.  Current  computations  have 
been  made  for  fully  turbulent  flow. 

If  (Nl)  is  assumed  to  be  a  mesh  of  usual  fineness,  the 
(Nl)  result  should  be  the  target  for  adaptation.  Results 
produced  with  the  coarser  mesh  (N2)  obviously  show  up 
high  level  numerical  errors.  It  (N2)  is  refined  locally  as 
described  above,  which  is  (N3),  the  result  is  already  very 
close  to  the  target  (Nl).  However,  the  computing  time 
is  only  about  40  p.c.  of  the  (Nl)  computation,  as  can  be 
seen  from  Fig.  6.  Additional  local  refinement  for  (Nl), 
which  is  (N4),  yields  again  a  solution  which  is  more  close 
to  the  experiment  both  near  the  nose  and  for  the  pres¬ 
sure  gradient  in  front  of  the  shock.  For  cost  reasons,  a 
target  computation  for  a  globally  refined  (Nl)  mesh  has 
not  been  performed.  The  experiment  has  been  used,  in¬ 
stead.  In  the  (N4)  case,  convergence  of  the  lift  coefficient 
is  reached  at  only  minor  additional  expense  compared  to 

(Nl). 

Fig.  6  shows  the  convergence  behaviour  of  the  method 
for  the  different  meshes.  The  residuals  for  all  cases  drop 
down  with  CPU  time  very  quickly.  There  are  no  spe¬ 
cific  observations  in  the  case  of  embedded  sub-blocks  be¬ 
ing  present.  However,  the  current  implementation  of  the 
Baldwin  Lomax  turbulence  model  in  the  MELINA  flow 
solver  may  cause  problems  if  the  sub- block  cuts  the  mesh 
within  the  boundary  layer.  If  such  a  sub-block  does  not 
extend  down  to  the  wall  surface,  then  the  wall  distance 
needed  for  the  turbulence  model  has  not  the  right  values 
and  may  thus  lead  to  bad  results  or  even  non-convergence 
of  the  overall  algorithm.  This  state  of  the  flow  solver 
hinders  automatic  adaptation  in  any  complex  case  at  the 
moment. 

6.2  3D  Wing/Body  Test.  Case 

The  application  of  the  sensor  analysis  implemented  in 
the  ADAPTOR  code  [LauMau95]  to  the  F4  wing/body 
Navier-Stokes  test  case  showed  up  nice  properties.  As  can 
be  seen  from  Figs.  7,8,  with  the  current  base  mesh  the  r- 
error  is  concentrated  in  the  vicinity  of  the  configuration. 
It  clearly  detects 

•  the  body  nose  region  as  being  not  properly  re¬ 
solved, 

•  the  wing  nose  region  as  spurious  entropy  produc¬ 
tion  region, 

•  the  shock  region  as  being  insufficiently  resolved  for 
steep  gradients, 

•  the  sonic  line  as  being  sensitive  to  numerical  errors, 

•  the  trailing  edge  and  wake  region  as  being  sensi¬ 
tive  because  of  rapidly  changing  flow  including  free 
shear  layers  and 

•  the  boundary  layer  near  the  wall  where  pre¬ 
adaptation  of  the  mesh  to  the  boundary  layer  pro¬ 
files  is  only  possible  up  to  a  certain  extent. 

This  makes  us  hope  that  automatic  recognition  of  defi¬ 
ciencies  in  discretization  is  possible,  and  adaptation  will 
reduce  the  overall  local  truncation  error. 


Figure  6:  Convergence  behaviour  for  meshes  (N1),..,(N4)  -  residual  and  lift  coefficient  against  CPU  time, 


F4  wing/body  configuration  -  continuity  equation  truncation  error  estimate  for  surface  and  symmetry  plane 


Figure  8:  F4  tving/body  configuration  -  truncation  error  estimate  of  x-momentum  equation  for  spanwise  mesh  plane. 


Figure  9:  F4  wing/body  configuration  -  Mach  contours  and  sub-blocks  at  spanwise  mesh  plane. 


Figure  10:  F4  wing/body  configuration  -  cp  at  63  p.c.  (left)  and  52  p.c.  (right)  wing  span  ;  fine  reference 
.  non-refined  ; - with  embedding. 
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The  ADAPTOR  run  with  the  medium  grid  truncation 
error  estimate  of  all  5  flow  equations  leads  to  multiple 
sub-blocks.  It  mainly  suggests  a  block  in  the  vicinity  of 
the  surface  along  the  whole  span  of  the  wing,  starting 
somewhere  below  and  behind  the  wing  and  extending 
over  the  upper  surface  again  behind  the  trailing  edge. 
A  second  block  covers  the  off-surface  region  around  the 
wing  nose  and  extends  about  the  supersonic  region.  Parts 
of  the  sub-blocks  can  be  seen  in  Fig.  9,  where  a  spanwise 
cut  with  local  Mach  contours  is  shown.  The  pressure 
distribution  at  two  mid-wing  cuts  show  quite  a  good  im¬ 
provement  compared  to  the  coarse  mesh  solution  (Fig. 
10).  The  suction  peak  as  well  as  the  pressure  roof  top 
gradient  and  the  shock  position  are  in  a  good  agreement 
compared  to  the  fine  mesh  reference  solution.  And  the 
locally  refined  mesh  has  only  about  40  p.c.  of  the  points 
of  the  global  fine  mesh. 

7  CONCLUSIONS 

Mesh  enrichment  based  on  a  structured  sub-block  ap¬ 
proach  has  been  considered  as  an  effective  way  to  im¬ 
prove  numerical  solution  of  flow  equations.  Tools  have 
been  defined  and  strategic  provisions  have  been  made  to 
test  this  approach  under  industrial  constraints.  Up  to 
now,  the  main  procedures  have  been  set  up.  Results  for 
locally  refined  meshes  have  been  calculated  for  2D  aud 
3D  Euler-  and  Navier-Stokes  test  cases.  Next  step  will 
be  the  full  integration  of  the  adaptation  into  the  flow 
solver  and  the  validation  and  improvement  of  the  overall 
process. 

Because  of  the  problems  with  turbulence  model  imple¬ 
mentation  in  Navier-Stokes  we’ll  first  try  to  sort  out  the 
automatic  adaptation  problems,  sensor  analysis,  etc.  on 
the  basis  of  the  Euler  equations.  More  general  sub-block 
-  to  -  sub-block  connections  are  under  development  which 
allow  a  more  cost-effective  re.solution  of  diagonal  flow  fea¬ 
tures. 

A  lot  of  tests  have  been  run  with  the  ADAPTOR  code, 
and  a  lot  of  changes  of  the  evaluating  strategy  have  been 
necessary  in  order  to  find  a  reasonable  suggestion  for  sub¬ 
blocks.  The  expense  of  more  than  50  p.c.  cost  saving 
which  we  have  achieved  with  the  current  examples  is  al¬ 
ready  quite  good  under  industrial  conditions.  Ongoing 
work  will  be  concentrated  on  making  adaptation  fully 
automatic,  robust  and  efficient. 

8  ACKNOWLEDGEMENTS 

The  basis  of  this  work  has  been  partly  conducted  as 
BRITE/EURAM  area  5,  CEC  funded,  applied  research. 
Recent  results  have  been  obtained  within  the  IMT  areaS, 
CEC  funded,  ECARP  project.  We  are  grateful  for  this 
support. 

References 

[Bec93]  Becker.  K.,  Aumann,  P.:  ’’The  Interac¬ 

tive  Grid  Generation  System  INGRID  - 
Version  5.0”,  DA-report,  December  1993. 

[Bra77]  Brandt,  A.:  ’’Multi-Level  Adaptive  So¬ 

lutions  to  Boundary  Value  Problems”, 
Mathematics  of  Computation,  Vol.  31, 
No.  138,  pp.  3.3.3-390,  April  1977. 

[Klim95]  Klimetzek,  F.:  ’’Fehlerastimatoren  fiir 

Stromungsberechnungsverfahren,  Teil  I 


(Literaturauswertung,  1993,  93-038)  und 
II  (Anwendung  und  Beurteilung,  1994, 
94-091),  Technical  Report,  Daimler- 
Benz-AG. 

[Lau95]  Lauke  Th.:  ’’Adaption  von  Recheirnetzen 

zur  Steigerung  der  Genauigkeit  von  3D- 
Stromungssimulationen”,  Diplomarbeit, 
Technical  University  of  Berlin,  June  1995. 

[LauMa.u95]  Lauke  Th.,  Ma.uch.  H.:  ’’ADAPTOR 
User’s  Manual”,  Daimler-Benz  Aerospace 
Airbus,  June  1995. 

[RilBec92]  Rill,  S.;  Becker.  K.:  ’’MELINA  -  A 
Multi-Block,  Multi-Grid  3D  Euler  Code 
with  Local  Sub-Block  Technique  for  Lo¬ 
cal  Mesh  Refinement”,  ICAS  Paper  92- 
4..3.R,  ICAS  Corrf.,  Beijing,  Sept.  1992. 


28-1 


Multiblock  Structured  Grid  Algorithms  for  EuierSolvers 
in  a  ParaileiComputing Framework 


Stefano  Sibilla 
Aermacchi  S.pA. 

Dipartimento  di  Aerodinamica 
Via  Foresio,  1  21040  Venegono  Superiore  {VA)  Italy 

and 

Marcello  Vitaletti 
IBM  Semea  S.pA. 

E.C.S.E.C. 

Piazza  G.  Pastore,6  00144  Roma  Italy 


SUMMARY 

Specific  algorithms  have  been  developed  for 
numerical  solution  of  Euler  equations  on  multi¬ 
block  structured  grids  of  general  topology;  these 
algorithms  involve  determinadon  of  convective 
and  dissipative  fluxes,  residual  collection  from 
fine  grid  levels  during  multigrid  cycles  and  time 
step  evaluation.  They  must  be  properly  integra¬ 
ted  with  residual  and  flow  variable  averaging 
when  the  internal  boundary  condition  is  introdu¬ 
ced. 

The  influence  of  block  subdivision  on  the 
bow-shock  in  front  of  a  blunt-nosed  body  is 
analysed  with  different  multiblock  algorithms;  a 
structured  and  a  locally  unstructured  topology 
are  also  compared. 

Results  show  that  no  additional  error  is  introdu¬ 
ced  in  multiblock  solutions  if  internal  block 
boundary  conditions  are  applied  at  each  stage 
and  edge/corner  boundary  cell  contributions  to 
flow  quantities  are  properly  taken  in  account. 


LIST  OF  SYMBOLS 

a  speed  of  sound 

Cd  drag  coefficient 

CFL  Courant  number 

Cpg,  stagnation  pressure  coefficient 

D  dissipative  flux 

E  specific  energy 

H  specific  enthalpy 

p  pressure 

Q  convective  flux 

q  flow  quantity  vector 

R  residui 

s  cell  face  area  vector 

u,v,w  velocity  components 

V  cell  volume 

^cv  control  volume 


x,y,z 

Cartesian  coordinates 

Y 

specific  heat  ratio 

At 

time  step 

e 

numeric^  viscosity  coefficient 

A 

spectral  radius 

V 

pressure  sensor 

curvilinear  coordinates 

P 

density 

1.  INTRODUCTION 

Multiblock  methods  consist  in  the  decomposition 
of  complex  computational  domains  into  simpler 
subdomains,  which  can  be  more  easily  handled 
in  the  management  of  the  simulation  and  in  the 
subdivision  of  the  computational  task  on  diffe¬ 
rent  processors. 

Structured  grid  blocks  can  be  generated  in  these 
subdomains,  in  order  to  combine  the  efficiency 
and  simplicity  of  CFD  algorithms  developed  for 
single-block  structured  grids  with  the  geometric 
flexibility  needed  to  describe  topologically 
complex  regions. 

The  main  difficulty  in  multiblock  methods  lies 
in  the  correct  treatment  of  block  interfaces, 
which  are  located  in  the  flow  region  and  repre¬ 
sent  a  numerical  boundary  condition  with  no 
reference  to  the  physical  problem:  their  presence 
can  introduce  errors  in  the  solution  which  can 
either  prevent  complete  convergence  to  the  exact 
solution  or  impose  constraints  on  the  grid  gene¬ 
ration. 

IBM  has  developed  a  parallel  multiblock  frame¬ 
work  called  PARAGRID  [1,2]  which  supports 
suitable  data  structure  for  the  management  of 
data  communication  between  adjacent  blocks. 
The  computation  is  performed  in  parallel  mode 
at  block  level,  thus  allowing  exploitation  of 
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workstation  clusters  and/or  multi-processor 
systems. 

A  structured  multiblock  Euler  solver  had  been 
previously  implemented  within  this  framework 
[3];  its  results  were  generally  good  as  long  as  the 
overall  solution  quality  and  global  aerodynamic 
coefficient  evaluation  were  concerned.  Problems 
were  nevertheless  detected  in  the  convergence 
rate  and  in  the  solution  quality  at  the  interfaces 
between  adjacent  blocks;  moreover,  only  "stru¬ 
ctured"  block  topologies  were  solved  consistently 
with  original  structured  algorithm:  this  means, 
for  example,  that  only  internal  edges  shared  by 
four  blocks  or  corners  shared  by  eight  blocks 
were  allowed;  for  all  other  block  topologies, 
approximate  corrections  were  introduced. 

Some  solution  algorithms  were  therefore  modi¬ 
fied  in  order  to  account  for  the  presence  of 
locally  unstructured  topologies  at  block  bounda¬ 
ries;  these  algorithms  were  designed  for  appHca- 
tion  in  a  parallel  environment,  minimizing  the 
number  of  data  exchanges  between  adjacent 
blocks,  and  therefore  the  communications  be¬ 
tween  computational  nodes. 


2.  NUMERICAL  SCHEME 
2.1  Finite  volume  formulation 

The  three-dimensional  Euler  equations 


dt 


dy  dz 


(1) 


where 


p 

pU 

pu 

:  f= 

pu2+p 

pv 

pUV 

pw 

puw 

PE. 

puH  . 

pv 

piv 

pUV 

puw 

pv'^+p 

\  h= 

pVW 

pvw 

pW^  +p 

pvH 

pwH 

are  written  in  integral  form 

(3) 


and  are  solved  through  a  cell-vertex  finite 

volume  space  discretization  [4]:  flow  quantity 
values  located  at  cell  corners  represent  average 
values  of  flow  quantities  in  the  control  volume 
made  of  aU  the  cells  (e.g.  8  for  an  internal  node 
of  a  structured  grid)  sharing  that  node. 
Convective  fluxes  through  the  control  volume 
surface,  which  are  represented  by  the  second 
term  in  the  left  hand  side  of  (3),  are  computed  as 
sum  of  the  contributions  of  all  the  cell  faces 
which  form  the  control  volume  surface  itself; 
face  values  are  taken  as  the  average  of  the  values 
at  the  corners  of  the  face. 

Such  scheme  is  equivalent  to  a  second-order 
accurate  central  difference  on  a  Cartesian  grid; 
such  discretization  leads  to  odd-even  decoupling, 
allowing  numerical  oscillations,  and  provides  no 
intrinsic  numerical  dissipation  to  damp  these 
oscillations  and  other  non-linear  instabilities.  A 
dissipative  term,  based  on  first-  and  third- order 
differences  of  the  flux  variables  and  scaled  on 
the  local  spectral  radii  of  the  flux  Jacobians,  is 
introduced  in  the  form  of  an  added  flux  term  [5]. 
For  a  control  volume  centered  on  the  grid  point 
i,},k  equation  (3)  takes  the  semi-discretized  form 

qij.k)  ^  ^IJ.k  ^  •  (4) 


In  equation  (4)  (gijK  is  the  discretized  convective 
flux 


NF, 


IJ.k 


where  is  the  number  of  cell  faces  forming 

the  surface  of  the  control  volume  centered  on 
node  i,j,k  and  having  area  vector  s;  the  form  of 
the  dissipative  flux  D|  ^  is  discussed  in  section  4. 
Equation  (4)  is  solved  in  time  by  a  5-stage 
Runge-Kutta  scheme  [6]  whose  coefficients  are 
chosen  in  order  to  allow  high  stability  limits 
(CFL  =  4  on  the  linear  convection  equation)  and 
large  margins  for  numerical  dissipation.  Stability 
limits  can  be  increased  by  two  or  three  times  if 
residuals  are  smoothed  by  application  of  a  suita¬ 
ble  implicit  operator  at  the  end  of  each  inter¬ 
mediate  Runge-Kutta  stage.  Finally,  multigrid 
method  for  the  reduction  of  low  frequency 
errors  [4]  accelerates  convergence  to  steady  state. 

2.2.  Domain  decomposition 

The  computational  domain  is  divided  into  smal¬ 
ler  hexahedral  structured  blocks;  each  of  the 
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faces  of  a  block  is  either  part  of  a  physical 
boundary  or  it  is  an  interface  to  an  adjacent 
block.  An  enlarged  computational  block  is  built, 
adding  to  the  original  core  a  two-layer  halo 
extending  in  all  the  blocks  sharing  a  boundary 
face,  edge  or  corner  with  the  original  subdomain. 
Updated  flow  values  are  available  in  the  halo 
regions  when  data  exchmge  is  performed. 
Although  made  up  of  structured  parts,  the  enlar¬ 
ged  subdomain  can  show  locally  unstructured 
regions  at  core  edges  or  corners  (figure  1). 
Equation  (5)  for  the  determination  of  the  conve¬ 
ctive  flux  depends  only  on  the  determination  of 
the  number  NF  of  cell  faces  that  form  the  con¬ 
trol  volume  surface;  it  can  be  applied 

straightforwardly  to  structured  as  well  as  to 
unstructured  topologies.  If  updated  values  are 
available  in  all  the  enlarged  copmpuational 
blocks,  identical  values  of  computed  for 

the  different  replicas  of  node  K. 

23.  Local  time  step  computation 

Local  time  step  must  be  computed  from  available 
data  in  each  stuctured  and  unstructured  control 
volume  in  a  minimum  number  of  computational 
steps  in  order  to  reduce  the  number  of  data 
exchanges.  At  the  end  of  each  step,  updated 
flowfield  quantities  are  available  only  in  the  core 
region  of  the  block.  Cell  spectral  radii  are  com¬ 
puted  as  sum  of  contributions  in  each  grid- 
coordinate  direction:  for  $ -direction  one  obtains 


n  =  A  •  1/  =  +  Tt^ +7t^  (9) 


is  derived  instead  of  (7).  Data  exchange  of  II 
values  is  performed  at  this  point):  having  built  11 
as  a  cell  quantity  instead  of  a  nodal  one,  no 
averaging  step  is  required  and  computational 
overhead  is  minimum. 

Being  the  time  step  in  the  control  volume  relati¬ 
ve  to  a  node 

* ,  CFL  CFL 

^cv  ^  ,  (10) 

/7=1 


where  NC  is  the  number  of  cells  which 
the  control  volume  CV,  a  convenient 
time  step  is  obtained  from  (9): 

build  up 
modified 

A  OFL 

A  =  -  = 

II 

(11) 

CFL 

__  At 

^cv '  ^cv 


where  the  average  spectral  radius  m  the  control 
(6)  volume 


and  similar  expressions  for  q-  and  C'directions, 
which  sum  up  into  the  local  spectral  radius 


^cv 


NC 


'cv  n=1 


(12) 


To  minimize  data  exchange  needs  in  the  compu¬ 
tation  of  spectral  radii  at  block  interfaces,  the 
product  of  cell  contributions  (6)  and  of  cell 
volumes 


Ttj  = 


T-e-isi  (8) 


has  been  introduced,  and  II  values  are  available 
in  the  whole  extended  domain. 

Modified  time  step  (11)  is  directly  introduced 
into  the  time-discretized  form  of  (4) 


At, 


_n+l  _n  ,  \ 

Ql.J.k  -  ^ I, J.k  ( ^l,l,k  ^l.j.k) 

^  ^i,i.k  ^i,J,k(^i.i,k*  ^ij,k) 


CV  (13) 


is  computed  in  each  core  region,  and 


from  whieh  updated  values  of  flow  quantities  are 
obtained. 
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2.4.  Multigrid  residual  driving 

In  multigrid  methods,  the  residual  of  numerical 
solution  of  (4)  on  the  fine  grid  is  used  to  "drive" 
the  residual  evaluation  on  the  coarser  ones,  i.e. 
coarse  grid  steps  are  used  to  determine  corre¬ 
ctions  to  fine  grid  residuals  rather  than  comple¬ 
tely  new  residual  values. 

A  simple  algorithm  has  been  used  to  collect  fine 
grid  nodal  residuals  to  coarse  grid  nodal  "driver 
residuals"  with  small  computational  effort  and 
validity  on  structured  as  well  as  unstructured 
topological  entities. 

A  fine  grid  ceU  residual  is  computed  as 


/+1  /+1  fr+1 


{F) 


2’^  2'^  2  ']=J 


(14) 


where  is  the  number  of  cells  sharing  fine 
grid  node  i,j,k  (figure  2-a).  After  cell  data 
exchange,  coarse  grid  nodal  "driver  residual" 
values  are  obtained  by  sum  of  the  contributions 
of  the  fine  grid  cells  which  share  the  coarse  grid 
node: 


o(C)  _ 
- 


NC, 


’UK 


E 

n=^ 


(15) 


Fine  grid  cell  values  (14)  contribute  to  one 
coarse  grid  node  only  and  the  algorithm  guaran¬ 
tees  correct  evaluation  of  driver  residuals  on 
unstructured  nodes  automatically  (figure  2-b). 


3.  BLOCK  INTERFACE  CONDITION 

3.1.  Data  exchange  strategies 

Contiguous  blocks  share  nodes  on  boundary  faces 
and/or  edges  and/or  vertices,  while  cells  belong 
to  a  single  block  only;  in  a  cell-vertex  formula¬ 
tion,  where  flow  variables  are  defined  at  nodes, 
different  values  may  be  computed  in  replicas  of 
the  same  boundary  node  o'vned  by  different 
blocks.  The  PARAGRID  framework  ensures  that 
the  same  average  value  is  assigned  to  all  such 
replicas  of  a  boundary  node  at  the  end  of  each 
block  update  step,  when  data  exchange  between 
blocks  is  performed. 

Three  different  implementations  of  the  multi¬ 
grid  algorithm  have  been  studied.  In  the  first 
implementation  the  block  update  step  includes  a 
full  multi-grid  cycle,  with  frozen  halo  data.  In 
the  second  implementation  the  block  update  step 


includes  the  five-stage  Runge-Kutta  cycle  on  a 
grid  level  and  the  restriction/prolongation  of 
solution  and  residuals  to  the  successive  grid 
level.  In  the  third  implementation  the  block 
update  step  only  includes  a  single  Runge-Kutta 
stage  on  the  current  grid  level. 

The  first  strategy  has  minimum  memory  requi¬ 
rements  and  maximum  parallel  efficiency  but 
leads  to  an  inconsistency  in  the  computation  of 
the  flow  field  at  internal  boundaries.  In  this  case, 
the  averaging  process  is  applied  to  the  flow 
quantities  associated  with  all  replicas  of  a  boun¬ 
dary  node. 

In  principle,  the  third  strategy  ensures  identity 
of  values  assigned  to  different  replicas  of  a 
boundary  node  at  the  price  of  larger  memory 
requirements  and  overheads  due  to  the  more 
frequent  exchange  of  halo  data.  In  practice, 
smadl  discrepancies  in  boundary  node  replicas 
still  occur,  due  to  the  implicit  nature  of  the 
residual  smoothing  phase  which  is  confined  to 
work  at  the  block  level.  With  this  choice  the 
averaging  process  is  applied  to  the  residuals 
rather  than  the  flow  quantities. 

The  second  choice  represents  a  compromise 
between  the  previous  two:  halo  flow  values  are 
stiU  frozen  during  the  time  integration,  but  more 
frequent  data  exchange  between  blocks  reduces 
strongly  the  generation  of  interface  errors;  on 
the  other  hand,  solution  is  faster  and  requires 
less  CPU  memory  than  the  exact  solution. 

3.2.  Numerical  experiments 

Numerical  experiments  show  that  numerical 
errors  introduced  by  the  first  interface  condition 
reduce  local  stability  margins  and  put  severe 
restrictions  on  the  block  subdivision  of  the  grid: 
block  interfaces  falling  in  the  middle  of  strong 
gradient  regions  can  often  lead  to  divergence  of 
the  computation. 

A  simple  geometry,  consisting  in  a  cyhndrical 
body  ending  with  a  spherical  cap  of  unit  radius, 
has  been  chosen  to  investigate  the  limits  of  the 
examined  strategies. 

Figure  3  shows  different  topologies  used  in  the 
analysis  of  the  blunt-nosed  body  at  a  Mach 
number  of  2  and  zero  incidence.  The  block 
interface  in  grid  "A"  crosses  intentionally  the 
bow  shock  close  to  the  symmetry  axis,  where 
shock  intensity  is  higher;  in  the  grid  "B"  the 
division  surface  has  been  moved  upstream. 
Single-block  solutions  are  compared  with  multi¬ 
block  solutions  obtained  by  application  of  diffe¬ 
rent  interface  treatments;  all  computations  have 
been  run  for  100  multigrid  steps  with  3-level  W- 
cycle,  after  50  +  50  initialization  steps  on  two 
coarser  grid  levels.  They  have  all  been  performed 
in  single  precision. 

Single-block  computations  (figure  4-a)  show  a 
bow-shock  located  in  front  of  the  nose,  at  a 
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distance  of  approx.  0.35  nose  radii;  maximum 
pressure  coefficient  at  stagnation  is  Cpg^  =1.63 
and  drag  coefficient  is  Cd  —0.7156  with  refe¬ 
rence  to  the  cross  section  area. 

Figure  4-b  shows  the  flow  pattern  resulting  from 
all  the  converged  computations  on  the  4-block 
grid  "A":  the  block  interface  passes  at  (r  =  -1.35, 
_V  =  0),  i.e.  where  the  bow-shock  intensity  is 
maximum,  but  the  position  of  the  bow  shock  is 
identical,  Cp^^  =1.63  and  Cd  =0.7756. 

Flow  variable  exchange  and  averaging  at  the  end 
of  the  multigrid  cycle  (figure  5)  leads  to  diver¬ 
gence  on  the  4-block  grid  "A"  at  a  CFL  number 
of  8  and  forces  either  a  reduction  of  CFL  num¬ 
ber  to  6  or  the  adoption  of  the  modified  grid  "B". 
Figure  6  shows  that,  at  CFL =8,  data  exchange  at 
the  end  of  each  Runge-Kutta  cycle  leads  to 
convergence  in  50  %more  steps  than  exchange  at 
each  intermediate  stage.  Figure  7  shows  that 
single  and  multiblock  computations  are  equiva¬ 
lent  in  the  latter  case,  and  that  the  interface 
boundary  condition  becomes  completely  transpa¬ 
rent  to  the  computation. 

A  simulation  of  the  transonic  vortical  flow 
around  a  wing-body-canard  sharp  leading  edge 
configuration,  at  a  Mach  number  of  0.85  and  an 
incidence  of  10°,  has  been  obtained  from  a 
multiblock  computation  with  data  exchange  at 
each  Runge-Kutta  stage,  and  compared  with 
single-block  results  [7]. 

Pressure  plots  in  the  cross  flow  (figure  8-a)  and 
on  the  wing  surface  (figure  8-b)  at  0.6  wing 
chords  show  that  block  decomposition  has  slight 
influence  on  position  or  intensity  of  the  vortices, 
although  block  interfaces  cross  both  the  wing 
leading  edge  and  the  canard  vortex. 


4.  NUMERICAL  DISSIPATION 

The  dissipative  flux  Z)|  | ,,  in  equation  (4)  is  based 
on  a  background  term,  dependent  on  the  third 
order  difference  of  the  flow  variables  scaled  on 
the  local  spectral  radii  of  the  flux  Jacobians.  A 
sensor  based  on  the  local  pressure  gradient 
switches  a  first  order  difference  term  in  presence 
of  flow  discontinuities. 

On  a  structured  grid  the  dissipative  flux  is 


2  2 


(17) 


scaled  on  local  spectral  radii  components  (6);  nu¬ 
merical  viscosity  coefficients  are  based  on  the 
pressure  sensor 


PMJ.k  ^Plj,k*  Pl-tJ.k 
PM,j,k  ^  Pi,j,k  P^J.k 


(18) 
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and  take  the  form 


(z)  1/(2)  min  V  1 

4  'A'/- 


(19) 


e'\  =  maxfO,  -e  1 


The  above  formulation  cannot  be  consistently 
applied  to  the  nodes  of  block  edges  and  corners 
where  the  block  topology  is  locally  unstructured: 
the  dissipation  computed  for  different  replicas 
of  such  boundary  nodes  on  the  basis  of  equation 
(16)  would  span  different  sets  of  neighbouring 
nodes,  thus  leading  to  an  inconsistency. 

An  unstructured  formulation  derived  from  the 
work  of  Mavriplis  [8]  has  been  tested  to  evaluate 
improvements  in  the  analysis  of  flows  in  these 
regions. 

An  approximation  to  the  Laplacian  at  the  boun¬ 
dary  node  K=(i,j,k)  is  constructed  as 


(16) 


and  each  mixed  first-  and  third-order  difference 
term  [4,6]  based  on  local  curvilinear  coordinate 
system  is 


n 


n 


dQK  =  'L  {Pj-Pk)  =  'LPj-nQK 


(20) 


where  the  summation  in  (20)  is  performed  over 
all  the  n  nodes  connected  by  a  cell  edge  to  node 
K. 

In  this  case  the  dissipative  flux  becomes  the  sum 
of  a  Laplacian  and  a  biharmonic  operator 
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Ou,  =  D/c  -  E  {“QJ  -  OOk)  '  \  " 


J=1 

t) 


(2), 


+E  {Qj-(fK) 
>/=1 


(21) 


where  A  is  the  spectral  radius  in  the  control 
volume  relative  to  node  K=(i,j,k),  the  pressure 
sensor  is 


E  {Pj-Pk) 

_ 

n 

E  {Pj^Pk) 

J=i 


(22) 


and  the  nodal  numerical  viscosity  coefficients,  as 
m  (19),  are 


e(2)  =  i/(2)minf-i,v'l 

1 4  )  (23) 

=  max(0, 

Coefficients  and  fl'*)  can  be  set  in  both 

formulations  to  obtain  desired  properties  of 
convergence  and  damping.  Optimal  convergence 
was  obtained  in  this  case  with  values  =  1  and 
C*'*'  =  1/32  in  the  structured  formulation,  =  1/2 
and  M'*)  =  15/1024  in  the  unstructured  formula¬ 
tion. 

A  different  grid  around  the  blunt-nosed  body 
has  been  generated:  a  7-block  grid  showing  an 
unstructured  edge,  shared  by  five  neighbouring 
blocks,  in  the  vicinity  of  the  bow  shock  wave. 
Figure  9  shows  the  block  decomposition  and  the 
flow  pattern  at  Mach  2  and  zero  incidence;  a 
slight  deflection  of  the  shock  wave  is  present  at 
the  unstructured  edge,  but  it  should  be  ascribed 
to  the  unsuitable  cell  distribution  in  the  zone. 
Computations  have  been  carried  out  with  both 
dissipation  schemes;  plots  of  the  logarithm  of 
density  residual  in  the  unstructured  edge  nodes 
are  compared  in  figme  10,  showing  that  errors 
due  to  inconsistent  computation  of  dissipative 
fluxes  (17)  prevent  from  complete  convergence, 
even  if  variable  averaging  is  performed  at  each 
Runge-Kutta  stage;  xmstructured  formulation 


(21)  leads  to  complete,  although  slower,  conver¬ 
gence. 

5.  TIME  AND  MEMORY  REQUIREMENTS 

CPU  time  requirements  have  been  measured  by 
serial  runs  of  the  blunt-nosed  body  test  case  on 
an  IBM  Rise  6000  550.  These  measure,  together 
with  data  relative  to  memory  occupation,  is 
obviously  dependent  on  the  code  FL067P-2  [9] 
and  on  the  PARAGRID  framework:  they  are 
presented  here  mostly  as  quahtative  comparison 
between  the  multiblock  algorithms  previously 
discussed. 

Table  1  shows  CPU  times  and  RAM  occupation 
needed  by  for  the  various  proposed  strategies; 
times  are  expressed  as  CPU  seconds  per  node  per 
iteration,  memory  occupation  is  expressed  as 
Mbytes  per  thousand  nodes. 

Exchange  at  multigrid  level  needs  20  %  less  time 
than  exchange  at  each  Runge-Kutta  stage;  this 
partly  compensates  for  the  reduction  in  CFL 
number.  On  the  other  hand,  it  needs  less  than  50 
%  memory.  Memory  occupation  can  in  aU  cases 
be  reduced  by  .08  Mbyte/knode  if  metric  coeffi¬ 
cients  are  recomputed  at  the  beginning  of  each 
block  update  step,  at  the  price  of  higher  time 
requirements. 


6.  CONCLUSIONS 

The  determination  of  the  most  convenient  multi¬ 
block  solution  strategy  among  the  examined 
algorithms  is  not  immediate. 

Numerical  experiments  of  section  3.2  and  4  show 
that,  if  data  exchange  is  performed  at  each 
intermediate  stage,  interface  condition  has  no 
impact  on  stability  limits  and  convergence  rate; 
other  conditions  generate  a  reduction  of  conver¬ 
gence  rate  and,  in  the  case  of  exchange  only  at 
the  end  of  the  multigrid  cycle,  of  stability  limits. 
On  the  other  hand,  approximate  solutions  at 
block  interfaces  yield  a  reduction  in  time  and 
mostly  in  memory  needs. 

Exchanging  halo  data  at  Rimge-Kutta  level  is  a 
compromise  solution  which  retains  stability 
bounds  of  the  exact  description  with  reduced 
time  and  memory  requirements  at  the  price  of  a 
slower  convergence. 
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data  exchange 
at  each 

numerical 

dissipation 

CPU  time 
s  pnts'''  iC 

RAM 

Mbyte  kpnts'^ 

Multigrid  cycle 

structured 

4.47  lO"* 

0.195 

Runge-Kutta 

cycle 

structmed 

4.75  10-'’ 

0.276 

Runge-Kutta 

stage 

structured 

5.55  10-^ 

0.404 

Runge-Kutta 

stage 

unstructured 

5.19  lO"* 

0.462 

Table  1  Time  and  memory  requirements  for  the  examined  multiblock  algorithms. 


core 


Figure  1  Example  of  unstructured  local  topology  in  a  multiblock  structured  grid:  edge  shared  by 
five  blocks. 
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Figure  5  Convergence  history  of  maximum  density  residual  for  multiblock  solution  of  the  flow 
around  a  blunt  nosed  body  (Mach  =  2,  a=0):data  exchange  performed  at  the  end  of  each 
multigrid  cycle. 


Figure  6  Convergence  history  of  maximum  density  residual  for  multiblock  solution  of  the  flow 
around  a  blunt  nosed  body  (Mach  =  2,  a  =0):  behaviour  of  different  strategies  for  data 
exchange  between  grid  blocks  at  CFL  =  8. 


Figure  7  Convergence  history  of  maximum  density  residual  for  multiblock  solution  of  the  flow 
around  a  blunt  nosed  body  (Mach  =  2,  a  =  0):  comparison  between  single-block  and 
multiblock  computations  at  CFL  =  6. 


Flow  around  a  wing-body-canard  configuration  (Mach  =  0.85,  a  =  10°): a)  iso-pressure  plot 
on  wing  and  canard;  b)  iso-pressure  plot  at  0.6  wing  chords  from  a  single-block  solution 
[7];  c)  iso-pressure  plot  at  0.6  wing  chords  from  a  solution  on  a  32-block  decomposition 
of  the  single-block  grid;  d)  pressure  coefficient  on  wing  surface  at  0.6  wing  chords. 
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AMELIORATIONS  RECENTES  DU  CODE  DE  CALCUL 
D’ECOULEMENTS  COMPRESSIBLES  FLU3M 


L.Cambier,D.Darracq^,M.Gazaix,Ph. Guillen, Ch.Jouet,  L.Le  Toullec 
ONERA,  B.P  72,  92322  Chatillon  Cedex,  France. 


1 

Abstract 

We  present  three  developments  which  have  been  intro¬ 
duced  in  the  code  FLU3M. 

A  numerical  method  for  solving  the  unsteady  Euler 
equations  with  time- varying  rigid  grids  is  first  studied; 
it  uses  the  van  Leer  scheme  together  with  a  second  or¬ 
der  in  time  implicit  algorithm. 

A  bidimensional  nozzle  and  an  afterbody  shape  have 
been  calculated  with  the  Jones-Launder  k-e  model,  the 
implementation  of  which  in  the  code  is  described  for 
one  and  two  species  gases. 

Then  a  new  implicit  algorithm  is  shown;  precisely  the 
DDLU  factorization  enables  a  reduction  both  in  CPU 
time  and  in  cost  memory  against  the  ADI  factoriza¬ 
tion. 

Resume 

Trois  developpements  effectues  dans  le  code  FLU3M 
sont  presentes.  Une  methode  de  resolution  des  equa¬ 
tions  d’Euler  instationnaires  pour  des  mouvements  de 
solide  est  d’abord  etudiee;  elle  utilise  le  schema  de  van 
Leer,  ainsi  qu’une  approche  implicite  d’ordre  deux  en 
temps  permettant  de  reduire  les  temps  de  calcul. 

Une  tuyere  bidimensionnelle  ainsi  qu’un  arriere-corps 
ont  ete  calculi  avec  le  modele  de  turbulence  k-e  de 
Jones-Launder,  dont  on  decrit  I’implantation  dans  le 
code  pour  un  ecoulement  monoespece  ou  biespece.  Des 
comparaisons  avec  Pexperience  sont  effectuees. 

Puis  un  nouvel  algorithme  de  resolution  implicite  a 
ete  etudie;  la  factorisation  DDLU  permet  des  gains  en 
temps  de  calcul  et  place  memoire  par  rapport  a  une 
factorisation  ADI. 

1.  Introduction 

Depuis  1987,  un  code  de  calculs  aerodynamiques 
(FLU3M),  multidomaines,  multiespeces,  est  developpe 
a  la  division  de  I’Aerodynamique  Theorique  1, 
de  PONERA. 

En  1989,  les  principaux  choix  numeriques  et  la  struc¬ 
ture  informatique  du  code  sont  publics  au  seminaire 
international  de  Boston  [1],  Des  calculs  numeriques 
Euler  gaz  parfait  et  gaz  reel  a  I’equilibre  y  sont  presen¬ 
tes  sur  des  configurations  multidomaines  telles  que  la 
navette  Hermes;  les  ecoulements  etant  supersoniques, 
des  techniques  de  marche  en  espace  sont  mises  en 

^Doctorant  sous  convention  CIFRE  SNECMA 


oeuvre.  La  possibilite  de  calculs  en  gaz  biespece  est 
illustree  par  un  calcul  de  jet  chaud. 

Depuis,  FLU3M  a  fourni  la  base  de  nombreux 
developpements,  autant  dans  le  domaine  des  mod- 
elisations  physiques,  que  dans  celui  des  techniques 
numeriques  ameliorant  la  precision  et  la  rapidite  des 
calculs. 

Ainsi,  les  equations  de  Navier-Stokes,  en  regime  1am- 
inaire,  sont  maintenant  resolues  numeriquement.  Le 
code  a  ete  eprouve  sur  plusieurs  cas  de  validation: 
par  exemple,  une  rampe  hypersonique  3D  presentee 
au  Workshop  d’Antibes  [2],  ou  encore  une  configura¬ 
tion  ogive-cylindre  avec  ecoulement  tourbillonnaire  [3]. 
Pour  les  ecoulements  hypersoniques,  un  nouveau  dia- 
gramme  de  Mollier  a  ete  etudie  [4];  en  plus  des  pro- 
prietes  thermodynamiques  de  Pair  a  Pequilibre,  il  four- 
nit  les  viscosites  et  conductivites  en  vue  de  calculs 
Navier-Stokes. 

De  nouvelles  possibilites  de  discretisation  en  espace  ont 
ete  explorees  et  en  particulier  les  techniques  de  mail- 
lages  chimeres.  Des  calculs  complexes  (separation  de 
missile)  peuvent  etre  ainsi  plus  facilement  realises  [5]. 
Nous  presentons  ici  plus  en  detail  trois  axes  de 
developpement.  Ces  developpements,  realises  dans  un 
code  unique,  sont  facilites  par  la  grande  modularite  du 
code  et  par  la  clarte  de  la  structure  arborescente. 

Un  axe  de  developpement  est  lie  a  Petude  des 
phenomenes  d’aeroelasticite.  La  mise  en  oeuvre  des 
equations  d’Euler  instationnaires  en  maillage  mobile 
est  presentee,  ainsi  que  differentes  approches  d’ordre  2 
en  temps  permettant  de  reduire  les  couts  de  calcul. 
Dans  le  cadre  des  activites  sur  les  modeles  de  tur¬ 
bulence,  nous  decrivons  Pintroduction  d’un  modHe  a 
deux  equations  de  transport  de  type  k-e,  pour  un  gaz 
parfait  monoespece  ou  un  gaz  biespece. 

Pour  terminer,  un  nouvel  algorithme  de  resolution 
du  systeme  implicite  est  presente.  Nous  etudions  la 
factorisation  DDLU  reduisant  Pespace  memoire  et  le 
temps  de  calcul  par  rapport  a  une  factorisation  ADI. 


2.  Calculs  instationnaires  en  maillage  mobile 

Dans  le  cadre  des  etudes  d’aeroelasticite  pour  les 
lanceurs  de  type  Ariane,  une  methode  numerique  Eu¬ 
ler  instationnaire  a  ete  developpee  dans  FLU3M.  Apres 
la  formulation  des  equations  instationnaires,  les  dif- 
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ferents  choix  numeriques  sont  exposes,  puis  le  cal- 
cul  d’un  profil  NACA  en  mouvement  de  tangage  est 
presente,  ainsi  que  celui  d’un  sphere-cone  3D  en  oscil¬ 
lation  autour  de  son  centre  de  gravite. 

2.1  Equations  instationnaires 
Considerons  un  profil  7  en  mouvement  de  tangage, 
muni  d’un  maillage  Q{t).  Dans  le  cadre  de  cette  etude, 
ce  domaine  D(t),  suppose  non  deformable,  est  en  mou¬ 
vement  par  rapport  a  un  repere  absolu  TZq- 

Sur  D(i),  les  equations  d’Euler  s’ecrivent,  sous  forme 
de  lois  de  conservation  : 


/  W{-^,t).dT+  [  F{W,lt).dS='0  (1) 

dtJaf^t)  Jan(t) 


du  jacobien  des  flux  sont  les  suivantes  ; 

{A  =  ~Vr-^  [ordre  3) 

A  =  +  c  [ordre  1)  (3) 

A  =  'v*r-'n  —  c  [ordre  1) 

Les  vecteurs  propres  du  jacobien  des  flux  sont  exacte- 
ment  les  memes  que  dans  le  cas  des  equations  en  mail¬ 
lage  fixe. 

Pour  discretiser  les  flux,  la  decomposition  de  van  Leer 
pent  etre  etendue  aux  equations  d’Euler  instation¬ 
naires.  Van  Leer  decompose  le  flux  sous  la  forme  suiv- 
ante  : 

/  =  /+  -f-  ou  f+[resp.f~)  a  des  valeurs  propres 
positives  (resp.  negatives). 

En  introduisant  le  nombre  de  Mach  relatif  normal 
Mrn  =  ,  nous  avons  : 


avec  W  =  [p,p~v,pE),  variables  conservatives  [p  :•  Si  Mm  >!>/"''  —  / 


densite,  if  :  vitesse  absolue,  E  :  energie  totale) 
et 


•  Si  Mm  <  -1.  /"  =  / 

•  Pour  I  Mm  |<  1,  a  Pour  expression  : 


ou  F[W-,li)  = 


pVr .  n 

p~v.[vf.li)  -t-  pn 
\  pE[vf.li)  +  plf.li  j 


(2) 


vf  est  la  vitesse  d’entrainement,  vf  la  vitesse  rel¬ 
ative.  A  la  difference  d’autres  approches  utilisant  les 
vitesses  relatives,  les  variables  de  calcul  sont  les  vari¬ 
ables  absolues,  c’est-a-dire  les  vitesses  absolues  ex- 
primees  dans  le  repere  absolu  TZq-  C’est  une  approche 
classique  qui,  par  rapport  aux  equations  en  maillage 
fixe,  demande  une  modification  des  flux  numeriques 
qui  font  intervenir  la  vitesse  d’entrainement  if  g,  ainsi 
qu’un  calcul  de  metrique  variable  au  cours  du  temps. 
Pour  discretiser  les  flux,  nous  utilisons  les  methodes  de 
decentrement;  Vinokur  en  donne  une  analyse  detaillee 
dans  [7]. 

2.2  Discretisation  des  flux 

On  pourra  verifier  que  le  jacobien  des  flux  a  pour  ex¬ 
pression  :  _ 


P 

/p 

-vf.lf 

Lv 

(7  —  1)1;^. "u  —  Vn-lf 

fpE 

(7  -  l)v^Vn  -  HVn 

p~v^ 

pE 

n 

0 

-(7  -  1)1?  (gl  V 
-1- V  (g  n  -f  (u„  -  Vm)I 

(7  -  l)li 

Hn  -  (7  -  l)vnV 

JVn  -  Vm 

avec  Vn  =  lf-lfetVm  =  l^-lf-  Les  valeurs  propres 


/+  =  f(Mm  +  lf 

/+_  =  f+[[7-l}vrn  +  2c]/'y.li  +f+[lt +  vlf) 

‘  [(7  -  l)Vm  +  2c]  V[2(7^  -  1)] 

+/.'^[(7  -  +  2cf\l7-Vm 

Le  flux  de  van  Leer  s’ecrit  alors  : 

Aan  Leer[W,,Wd)  =  f+[W,)  +  /“  (TVd) 

Pour  I’utilisation  de  methodes  implicites,  ce  flux  doit 
etre  linearise. 


2.3  Metrique  instationnaire 

Au  cours  du  mouvement  du  profil,  le  maillage  est  mo¬ 
bile  et,  par  consequent,  les  normales  aux  interfaces 
doivent  etre  calculees  a  chaque  instant.  Par  hypothese, 
le  maillage  ne  se  deforme  pas,  les  volumes  ne  changent 
done  pas.  Ils  sont  calcules  une  fois  pour  toutes  a 
I’instant  initial  to¬ 
ll  faut  evaluer  la  moyenne  de  if,  sur  un  pas  de  temps 
At,  ce  que  I’on  fait  en  considerant  I’instant  t„_,_i. 

Nous  avons  alors  :  lf[t^^i)  =  ■R(tn -t-i)- ”  (^o)i 
R  est  la  matrice  de  rotation  du  mouvement  prise  a 
I’instant  n+^.  Le  calcul  des  flux  necessite  egale- 
ment  la  connaissance,  a  I’instant  t„^.i,  de  la  vitesse 
d’entrainement  aux  interfaces  des  mailles  de  calcul. 
Connaissant  les  coordonnees  a  I’instant  t=0  du  centre 
I  d’une  interface,  la  vitesse  d’entrainement  est  donnee 
par  : 


(4) 
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ou  (5  est  le  tenseur  de  vitesse  de  rotation  du  solide, 
A  un  point  du  solide. 

Ceci  complete  et  definit  les  donnees  necessaires  a  un 
calcul  instationnaire,  en  maillage  rigide. 

2.4  Conditions  aux  limites  instationnaires 
Considerons  I’exemple  d’un  profil  muni  d’un  maillage 
en  C,  en  mouvement  par  rapport  a  TZo-  Un  ecoulement 
uniforme  est  impose  a  I’amont  du  profil. 

Deux  conditions  aux  limites  sont  a  envisager:  d’une 
part,  aux  frontieres  a  I’infini,  pour  imposer  un  ecoule¬ 
ment  tout  en  evitant  les  reflexions  d’ondes;  d’autre 
part,  a  la  paroi  ou  une  condition  de  glissement  doit 
etre  imposee. 

1.  Conditions  a  la  limite  a,  la  paroi. 

D’apres  la  condition  de  glissement,  ■  n  =  0,  ou 
Vr  est  la  vitesse  relative.  Le  flux  a  la  paroi  devient  : 

=  5;.  p.li 

\  piu.li  ) 


2.5.1  Schemas  precis  a  I’ordre  1 

Le  schema  explicite  :  —  W”'  -|-  Atg”'  est  precis 

a  I’ordre  1.  II  est  stable  sous  la  condition  cfl  1,  ce 
qui,  en  pratique  impose  des  pas  de  temps  tres  petits. 
En  effet,  pour  calculer  avec  un  maillage  assez  fin  un 
profil  NACA64A010  oscillant  avec  un  mouvement  de 
battement  a  une  frequence  de  100  Hz,  40  000  pas  de 
temps  explicites  sont  necessaires  pour  effectuer  un 
cycle. 

Le  schema  implicHe  :  est 

precis  a  I’ordre  1.  La  fonction  est  linearisee  en  : 

—  W"^)  ce  qui  conduit  au 
schema  :  (7  -  (At)^ -  W^)  =  At.  Ce 
schema  est  inconditionnellement  stable.  Pour  obtenir 
la  meme  precision  qu’un  calcul  explicite,  sur  des 
grandeurs  telles  que  la  portance  et  le  moment,  des  cfl 
de  I’ordre  de  100  peuvent  etre  utilises  pour  les  profils 
oscillants. 

2.5.2  Schemas  precis  a  I’ordre  2 

Schema  Runge-Kutta  implicite  de  lannelli-Baker 
C’est  un  schema  Runge-Kutta  a  deux  etapes 
implicites  [10]. 


ou  p  est  la  pression  a  I’interface,  calculee  par  une  ex¬ 
trapolation,  eventuellement  corrigee  par  une  relation 
caracteristique. 

2.  Conditions  d  la  limite  d’entree-sortie. 

Nous  adoptons  la  formulation  proposee  par  Coller- 
candy  [9].  Elle  est  etendue  aux  equations  en  maillage 
mobile. 

Cinq  caracteristiques  de  pentes  A, 

A  G  {vrn ,  Urn  +  c,  Vm  —  c}  arrivcnt  a  I’interface  a 
I’instant  n-t-1.  Suivant  le  signe  de  la  pente  A,  la  vari¬ 
able  caracteristique  associee  sera  calculee  avec  un  etat 
exterieur  ou  interieur. 

Plus  precisement,  le  calcul  des  valeurs  propres  est  ef- 
fectue  a  I’aide  d’un  etat  moyen: 

IPm  —  ^  (f^int erieur  ~b  qUl  permet  de 

calculer  les  valeurs  propres  et  de  connaitre  leur  signe. 
On  calcule  ensuite  les  variables  caracteristiques  asso- 
ciees  a  ces  valeurs  propres,  avec  les  etats  interieur  et 
exterieur  . 

Si  A  est  negative,  c’est  la  variable  caracteristique  ex- 
terieure  qui  sera  choisie;  sinon,  on  prendra  la  variable 
caracteristique  interieure. 


2.5  Discretisation  en  temps 
La  precision  en  temps  des  schemas  utilises  en  instation- 
aire  est  un  point  important.  Plusieurs  schemas  d’ordre 
un  ou  deux  en  temps  sont  decrits  et  leurs  proprietes 
de  stabilite  sont  brievement  rappelees.  L’equation  a 
discretiser  en  temps  est  la  suivante: 


dW 

dt 


g{w) 


(7  -  a{Atf  A^)AW1  =  g{W'^)At 
{I -a{AtfA”-)AW2  =  g{W^  +  0AWl)At 
et  +  jiAWl  +  72AW2 

2 _ 

avec  a  —  - - - ,  /3  =  2(3v^  —  4), 

6- 6  +  V2 


(6) 

(7) 

(8) 

(9) 

(10) 


Ce  schema  est  inconditionnellement  stable,  avec  des 
cfl  beaucoup  plus  grands  que  ceux  utilises  avec  un 
schema  implicite  d’ordre  1,  de  I’ordre  de  400,  toujours 
pour  obtenir  des  resultats  de  precision  equivalente  a 
celle  d’un  calcul  explicite. 


2.6  Cas  de  validation 
2,6.1  Profil  NACA64A010 

Le  profil  choisi  est  un  NACA64A010,  correspondant 
aux  conditions  experimentales  suivantes  (Fig.l): 


M 

0.796 

Poo 

203321  Pa 

do 

1.01° 

f 

34.4  Hz 

U  — 

0.404 

Xg 

Lrcf 

0.25 

La  loi  du  mouvement  du  profil  de  I’aile  est  donnee 
par; 


(5) 


a{t) 


ocQ  *  sin{k.T) 
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Le  profil  a  ete  defini  dans  un  rapport  AGARD  [11] 
et  calcule  par  de  nombreux  auteurs  [8]. 

Pour  chaque  essal  instationnaire,  on  presente 
I’evolution  de  la  portance  et  le  moment.  Les  calculs 
des  differentes  approches  en  temps  sont  compares  a 
I’experience  et  a  un  calcul  explicite  de  reference  a  cfl 
=0,8. 

Sur  les  figures  2  et  3,  les  calculs  des  approches  d’ordre 
1  et  d’ordre  2  en  temps,  pour  un  cfl  de  400,  sont 
representes.  Si  nous  considerons  le  calcul  du  moment, 
I’approche  d’ordre  1  ne  donne  pas  le  meme  resultat  que 
le  calcul  de  reference  explicite,  alors  que  I’approche 
d’ordre  2  donne  un  resultat  identique. 

A  cfl  =  100,  pour  la  methode  d’ordre  1,  le  cout  de 
calcul  est  divise  par  17  par  rapport  a  une  methode  ex¬ 
plicite  a.  cfl  —  1  (400  pas  de  temps/cycle  contre  40  000 
en  explicite).  Le  cout  de  la  methode  Runge-Kutta  im- 
plicite  d’ordre  2  est  pratiquement  identique  a  celui  de 
la  methode  d’ordre  1,  puisque  la  matrice  implicite  est 
la  meme  dans  les  deux  pas  de  calcul. 

En  conclusion,  la  methode  implicite  d’ordre  2  en  temps 
est  la  plus  indiquee  pour  des  calculs  instationnaires. 

2.6.2  Sphere-cone  Aerospatiale 
Un  corps  sphero-conique  3D  a  egalement  ete  calcule 
(Fig. 4);  ce  corps  est  en  oscillation  autour  de  son  cen¬ 
tre  de  gravite  G.  Le  nombre  de  Mach  a  I’infini  est  de 
7.  L’angle  de  tangage  maximum  est  oq  =  1°  La  loi  du 
mouvement  est  donnee  par: 

a{t)  =  ao  *  sin{k.T) 

avec  k  =  =  0,  386.  Un  calcul  instationnaire  a  cfl 

•  inf 

=  100,  avec  la  methode  Runge-Kutta  implicite  a  ete 
realise  (Fig.4).  Une  comparaison  a  ete  faite  avec  un 
autre  calcul  effectue  par  Aerospatiale  [?].  Un  ecart  de 
8%  est  observe  sur  Cma,  alors  que  Cma  +  C!mq  est 
identique  dans  les  deux  calculs  (Fig. 5). 

3.  Modelisation  de  la  turbulence 

3.1  Modele  a  deux  equations  de  transport 
Le  modele  de  turbulence  (fc,e)  de  Jones-Launder  [13] 
est  implante  dans  le  code  de  calcul  FLU3M.  Les  deux 
equations  de  transport  pour  pk  et  pe  s’ecrivent: 

dtpk  +  div{pkv)  =  tr  :  Vu  +  div[{p  -f  — )Vfc) 

-pe  +  Dk  (11) 

dtpe  +  div[pev)  =  Ce,  jTr  ^  Vu  -f-  div([p  +  — )Ve) 

K 

—  Ce,f2-j^pc  +  D^  (12) 

Le  coefficient  de  viscosite  turbulente  p*  a  pour  expres¬ 
sion: 

p,  =  C,f,^  (13) 


Le  choix  suivant  a  ete  considere  pour  les  coefficients  du 
modele:  —  1.57,  Cc^  —  2.-  a*  =  1.,  =  1.3,  = 

0.09  .  Les  termes  Dk  et  designent  des  termes  addi- 
tionnels  lies  a  la  formulation  bas- Reynolds  et  destines  a 
representer  I’amortissement  de  la  turbulence  au  volsi- 
nage  des  parois.  Dans  Particle  de  Jones,  et  Launder 
[13],  I’expression  de  ces  termes  est  donnee  en  repere 
de  couche  limite.  Dans  le  cadre  de  la  resolution  des 


equations  de  Navier-Stokes,  nous  avons 
expressions  suivantes  pour  Dk  et  D^: 

considere  les 

Dk  -  -2p.\\vVk\f 

(14) 

Df  —  ^^^^\\rot{rotv)\f 
P 

(15) 

Les  quantites  /2  et  sont  aussi  liees  a  I’amortissement 

de  la  turbulence  pres  des  parois.  Elies  sont  fonctions 

du  nombre  de  Reynolds  turbulent  Ret'. 

1 

!l 

(16) 

pe 

/2  =  1  -  0.3exp{-Ret) 

(17) 

^  SO 

(18) 

Ce  choix  de  fonctions  d’amortissement  ne  faisant  inter- 
venir  ni  la  distance  a  la  paroi,  ni  le  frottement  parietal, 
permet  de  realiser  une  programmation  du  modele  de 
turbulence  independante  de  I’application  consideree, 
ce  qui  constitue  un  avantage  important  pour  un  code 
traitant  des  applications  multidomaines  complexes,  tel 
que  le  code  FLU3M. 

Le  modele  de  turbulence  (k,  e)  qui  vient  d’etre  decrit 
peut  etre  associe  dans  le  code  FLU3M,  soit  a  une  for¬ 
mulation  monoespece,  soit  a  une  formulation  biespece. 
Dans  cette  derniere  formulation,  on  se  place  alors  dans 
le  cadre  de  I’ecoulement  compressible  turbulent  d’un 
melange  non  react  if  de  deux  especes,  chaque  espece 
etant  supposee  etre  un  gaz  parfait  a  chaleurs  spe- 
cifiques  constantes.  La  discretisation  des  equations 
s’effectue  de  maniere  analogue  pour  le  systeme  ”  mono¬ 
espece  (  )”  et  pour  le  systeme  ’’biespece  (  k  ,  e  )”. 

Les  flux  convectifs  sont  discretises  a  I’aide  d’extensions 
du  solveur  de  Riemann  de  Roe  aux  systemes  d’  equa¬ 
tions  couplees.  Les  valeurs  propres  de  la  matrice  Jaco- 
bienne  ont  pour  expression:  Ai  =  u  (ordre:  neq  —  2), 
A2  -  u  +  c  (ordre:  1),  A3  =  u  -  c  (ordre:  1),  ou  neq 
designe  le  nombre  total  d’equations  (7  en  monoespece 
et  8  en  biespece).  La  quantite  c  est  une  vitesse  du  son 
modifiee  donnee  par  : 

c2=q(P+|fc)  (19) 

p  3 

Le  rapport  des  chaleurs  specifiques  7  est  suppose  con¬ 
stant  dans  la  formulation  monoespece,  alors  qu’en 
biespece,  il  depend  des  densites  partielles  et  des 
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chaleurs  specifiques  des  deux  especes,  de  la  maniere 
suivante: 

_  1  ^  PiCvij'yi  -  1)  +  p2Cv2{l2  -  1)  /2o) 
P1CV1  +  P2CV2  ’ 

Le  flux  numerique  de  Roe  s’ecrit; 

F^iUL,UR)  =  ^iF{Ur,)  +  FiUn)) 

-^P^\\A^\\P^-\Ur-Ul)  (21) 

ou  designe  la  matrice  diagonale  des  valeurs  propres, 
et  P^  et  P^  designent  les  matrices  de  passage.  La 
notation  R  en  indice  superieur  Indique  que  les  quan- 
tites  sont  calculees  a  I’aide  de  moyennes  de  Roe. 

La  precision  du  second  ordre  en  espace  est  obtenue 
grace  a  I’approche  MUSCL  appliquee  sur  les  variables 
primitives.  Les  flux  visqueux  sont  evalues  a  I’aide 
d’une  discretisation  centree  en  espace.  Dans  le  cas 
de  la  formulation  biespece,  on  tient  compte  de  la  dif¬ 
fusion  entre  les  especes  par  I’intermediaire  d’un  nom- 
bre  de  Lewis  Le  et  d’un  nombre  de  Lewis  turbulent 
Let  ■  Une  acceleration  de  convergence  est  realisee  a 
I’aide  d’une  phase  implicite  et  de  la  technique  du  pas 
de  temps  local.  La  phase  implicite  s’appuie  sur  une 
linearisation  des  flux  de  van  Leer  pour  les  flux  con- 
vectifs,  une  linearisation  similaire  a  celle  de  Coakley 
pour  les  flux  visqueux,  une  linearisation  simplifiee  de 
la  partie  negative  des  termes  source  et  une  inversion 
ADI  de  la  matrice  implicite. 

A  titre  d’exemples,  on  presente  ici  les  resultats 
obtenus  dans  le  cadre  de  la  formulation  ’’monoespece 
,  sur  une  configuration  d’lnteraction  onde  de 
choc/couche  limite  dans  un  canal  bidimensionnel,  puis 
sur  une  configuration  d’arriere-corps  axisymetrique. 
La  premiere  configuration  correspond  a  une  experience 
[14]  realisee  a  I’ONERA  dans  une  tuyere  symetrique. 
Le  nombre  de  Mach  en  amont  de  I’interaction  est  egal 
a  1.45.  Sur  la  figure  6  qui  represente  les  courbes  iso- 
nombre  de  Mach  calculees,  on  peut  voir  la  structure 
classique  de  choc  en  A  dans  la  region  d’interaction  et 
1 ’important  epaississement  de  la  couche  limite  resul¬ 
tant  de  I’interaction  avec  le  choc.  La  figure  7  presente 
une  comparaison  avec  I’experience  portant  sur  la  dis¬ 
tribution  de  pression  parietale.  Le  plateau  de  pression 
obtenu  par  le  calcul  dans  la  region  d’interaction  est 
plus  petit  que  dans  I’experience,  ce  qui  correspond  a 
une  legere  sous-estimation  de  la  taille  de  la  region  de- 
collee.  Le  resultat  obtenu  est  sur  ce  point  comparable 
a  des  resultats  obtenus  anterieurement  avec  d’autres 
codes  de  calcul  mettant  en  oeuvre  le  modele  (fc,  e),  sur 
la  meme  configuration. 

La  deuxieme  configuration  traitee  correspond  a  un 
arriere-corps  axisymetrique  muni  d’une  tuyere.  Les 
conditions  de  I’ecoulement  externe  sont  les  suivantes: 
nombre  de  Mach  egal  a  4.18,  temperature  et  pression 


generatrices  respectivement  egales  a  325  Kelvins  et  10 
bars.  La  pression  generatrice  du  jet  est  plus  elevee  et 
egale  a  42.3  bars.  Le  nombre  de  Reynolds  calcule  a 
partir  des  grandeurs  critiques  associees  au  jet  et  du 
rayon  du  culot  est  egal  a  1.15  10^.  Le  domaine  de  cal¬ 
cul  est  divise  en  trois  sous-domaines:  un  sous-domaine 
Di  correspondant  a  I’ecoulement  externe,  un  sous- 
domaine  D3  correspondant  a  la  sortie  de  la  tuyere  et  au 
jet,  et  un  sous-domaine  intermediaire  D2  comprenant 
la  region  du  culot.  Le  nombre  total  de  points  de  mail- 
lage  est  egal  a  13,879.  Des  rafiinements  importants 
sont  introduits  pres  des  parols.  Par  exemple,  la  taille 
de  maille  pres  de  la  paroi  externe  de  I’arriere-corps 
est  egale  a  10“'^ jR.  Sur  la  frontiere  amont  du  sous- 
domaine  externe  Di,  on  impose  des  profils  issus  des 
donnees  experimentales  pour  la  vitesse  et  les  grandeurs 
turbulentes,  alors  que,  sur  la  frontiere  amont  du  sous- 
domaine  D3,  les  profils  imposes  sont  issus  d’un  calcul 
preliminaire  de  I’ecoulement  dans  la  tuyere. 

La  figure  8  qui  represente  la  solution  sous  forme  de 
courbes  iso-nombre  de  Mach,  montre  la  forme  clas¬ 
sique  en  tonneau  du  jet,  ainsi  que  I’onde  de  choc  situee 
dans  le  jet.  Une  comparaison  avec  I’experience  [15] 
est  representee  sur  les  figures  9  et  10,  sous  forme  de 
profils  d’energie  cinetique  de  turbulence  et  de  pression 
pitot  dans  deux  sections  situees  en  aval  du  culot  a  des 
distances  egales  a  0.5972  et  a  6  R.  Les  points  experi- 
mentaux  ont  ete  obtenus  par  velocimetrie  laser  et  par 
un  tube  de  Pitot.  Bien  que  les  donnees  experimentales 
pour  I’energie  cinetique  de  turbulence  ne  soient  rela¬ 
tives  qu’a  la  partie  externe  de  I’ecoulement,  I’accord 
apparait  comme  satisfaisant. 

4.  Factorisation  DDLU 
4.1  Description  de  I’algorithme 

L’analyse  de  stabilite  lineaire  de  von  Neumann  de 
la  factorisation  approchee  ADI  revele  une  instabilite 
inconditionnelle  en  3D  (Cf.  Ying  [21]).  Meme  si 
les  termes  non  lineaires  jouent  un  role  stabilisateur 
comme  tendent  a  le  prouver  les  codes  de  calcul  util- 
isant  une  telle  approche,  la  factorisation  triple  ADI 
reste  penalisee  par  un  nombre  d’operations  important 
et  surtout  une  severe  restriction  de  cfl  due  a  I’erreur 
de  factorisation  en  Af®.  Une  factorisation  de  type 
DDLU  a  done  ete  developpee  pour  ameliorer  I’efiicacite 
de  I’algorithme  implicite.  Les  premieres  methodes  de 
decomposition  DDLU  de  la  matrice  implicite  ont  ete 
proposees  simultanement  par  Jameson  et  Turkel  [17] 
et  Steger  et  Warming  [20]  en  1981.  Alors  que  les  tech¬ 
niques  de  directions  alternees  consistent  a  substituer 
a  I’operateur  implicite  un  operateur  factorise  suivant 
les  directions  du  maillage,  la  methode  LU  le  decom¬ 
pose  en  deux  matrices  triangulaires  superieure  et  in- 
ferieure.  Jameson  et  Turkel  montrent  qu’un  tel  sys- 
teme  est  bien  conditionne  si  les  matrices  sont  a  diago¬ 
nals  dominantes.  Aussi  ont-ils  propose  une  decompo¬ 
sition  des  matrices  jacobiennes  de  la  matrice  implicite 
augmentant  la  diagonale.  Dans  le  schema  original,  le 
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systeme  implicite  est  mis  sous  la  forme  : 

LU.AQ  =  -At  R  (22) 

Dans  le  cas  d’une  discretisation  decentree,  il  vient  : 

L^I  +  At{d^A+  +  d-B++d^d+) 

U  =  I +  At{d+A- +  d+B- +  d+d-) 

ou  A'^ ,A~ ,  B^ ,  B~  ,C'^ sont  les  matrices  jacobi- 
ennes  des  flux  et  r],  ^  les  coordonnees  curvilignes.  Ce 
schema  reste  peu  utilise  sous  cette  forme. 

Jameson  et  Turkel  [17]  ayant  montre  que  la  condition 
de  dominance  diagonale  permet  d’assurer  le  bon  con- 
ditionnement  des  facteurs  L  et  U,  Jameson  et  Yoon 
[18]  ont  developpe  une  variante  (que  I’on  appellera  ici 
DDLU  par  analogic  avec  le  DDADI),  renforqant  la  di¬ 
agonale  pour  les  equations  d’Euler,  puis  Yoon  et  Jame¬ 
son  [22]  pour  les  equations  de  Navier-Stokes,  Shuen  et 
Yoon  [19]  pour  les  ecoulements  reactifs,  enfin  Darracq 
(1995)  [16]  pour  le  modele  k-e  .  Avec  ce  formalisme, 
il  vient: 

LD-'^U.AQ  =  -AtR  (24) 

avec  : 

L  =  I+At{d-A+  +  a-B+  +  a-C+  C-) 

D  =  I+At{A+ -A- +B+ -B- +d+ -C-)  ^  (25) 

u^i+  Atid^A+  +  a-B+  +  a^d+  -A+  -B+  -  c+) 

Les  matrices  jacobiennes  A"*",  B'^  et  (respective- 

ment  A~ ,  B~  et  C~)  sont  construites  de  fagon  a  ce 
qu’elles  ne  possedent  que  des  valeurs  propres  positives 
(respectivement  negatives),  c’est-a-dire  : 

'l+  =  TfA+T^-‘  ;  A-=T^AJT- 
<  .B+  =  r,A+T-'  ;  B-=Tr,A-T-^  (26) 

d+  =  ;  C-  = 

On  peut  ecrire  de  fagon  generale  la  decomposition  des 
matrices  jacobiennes  sous  la  forme  : 

~  i )]  (27) 

avec  Aj,7),<  matrice  diagonale  des  valeurs  propres 

A^,r),c- 

La  fonction  7  decrit  le  caractere  du  decentrement. 
Dans  la  decomposition  classique,  on  definit  7  par: 

t(a^,„,c)  =  \h,v,d 

=  diag(|A^,,,,J) 


Dans  la  decomposition  de  Jameson  et  Turkel  [17]  ^,on 
vise  a  augmenter  la  dominance  diagonale  : 

t(A|.7,.c)  =  (3-  max(|Af,^,<  1)7  (29) 

avec  Q  >  1.  Les  relations  (26)  et  (26)  permettent 
d’ecrire  avec  (27)  : 

D=I  +  At[TfT(Ae)r-'  +  r„7(A„)T-i  +  Tf7(A„)T-M  (30) 

La  diagonale  D  possede  une  structure  bloc  dans  le  cas 
de  la  decomposition  (28)  mais  devient  scalaire  quand 
on  utilise  (29)  : 

D  =  I A  At/3[max(|A^|)  4-  max(|A^  |)  +  maxdA,;  |)]7  (31) 

Remarquons  que  cette  propriete  de  reduction  de  la  di¬ 
agonale  bloc  a  une  diagonale  scalaire  est  verifiee  pour 
les  factorisations  de  type  DDLU,  DDADI  et  meme 
ADI.  A  I’oppose,  la  factorisation  LU  de  base  (23),  a 
cause  de  sa  nature  dissymetrique,  ne  peut  beneficier 
de  cette  diagonalisation.  Les  matrices  jacobiennes  aux 
interfaces  sont  evaluees  a  partir  de  la  moyenne  de  Roe 
afln  de  preserver  une  consistance  avec  le  schema  ex- 
plicite. 

Balayage  plan  oblique 

Le  balayage  du  domaine  de  calcul  suivant  les  direc¬ 
tions  diagonales  (i-fj-f-k  constant)  dans  le  sens  crois¬ 
sant  (operateur  L)  puis  decroissant  (operateur  U)  per¬ 
met  ae  vectoriser  completement  I’inversion  des  matri¬ 
ces  L  et  17  en  evitant  les  recurrences  non  vectorisables. 
La  recurrence  entre  points  de  la  factorisation  ADI  de¬ 
vient  une  recurrence  entre  plans  lors  de  la  factorisation 

DDLU. 

En  outre,  le  balayage  diagonal  fait  intervenir,  autour 
du  point  courant,  des  points  dont  la  mise  a  jour  a 
ete  effectuee  a  I’etape  precedente.  On  peut  ainsi  les 
ajouter  au  membre  de  droite  :  il  n’y  a  done  aucun 
bloc  a  inverser. 


Fig.  12  :  Balayage  oblique  en  3D 
A  litre  de  comparaison,  on  a  implante  un  algorithme 
du  type  SSOR  avec  les  choix  de  decomposition  (29) 
conduisant  a  la  diagonale  scalaire  (33).  Il  s’agit  d’une 

^  Cette  decomposition  doit  etre  distinguee  de  la  techmque  du 
rayon  spectral  appliquee  aux  schemas  decentres  [20]  : 

A^  —  max(|A^|)/ 


(28) 
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approche  iterative  de  sur-relaxation  symetrique  avec 
balayage  oblique  : 

{D  +  i^L)AQ'‘+^  =  -uAtR  +  (l-oj)DQ>‘ 

[D  -t-  a;t/) AQ''+l  =  -wAtR  -I-  (1  -  a;)DQ''+  i  -  2 


4.2  Application 

On  presente  ici  I’application  de  I’algorithme  DDLU 
dans  sa  version  fluide  parfait  sur  le  cas  test  d’un  fuse¬ 
lage  lenticulaire  avec  retreint.  Le  nombre  de  Mach 
vaut  4,5  et  I’incidence  est  de  10°.  Le  maillageest  com¬ 
pose  de  42  X  27  X  44  points.  La  figure  1 1  represente  la 
solution  obtenue  a  partir  du  schema  DDLU.  La  solu¬ 
tion  donnee  par  le  schema  ADI  est  identique.  La  figure 
13  represente  I’histoire  de  la  convergence  des  residus 
implicites  de  la  densite.  La  montee  en  CFL  est  effec- 
tuee  jusqu’a  une  valeur  de  500  pour  les  formulations 
DDLU  et  SSOR  et  jusqu’a  100  pour  la  formulation 
ADI.  La  vitesse  de  convergence  de  I’implicite  DDLU 
est  meilleure  que  celle  de  I’implicite  ADI.  L’algorithme 
SSOR  permet  une  convergence  plus  rapide  que  celle  de 
I’algorithme  DDLU.  Mais  le  nombre  d’iterations  in¬ 
ternes,  une  douzaine,  augmente  les  temps  de  calcul 
qui  deviennent  comparables  a  ceux  du  schema  ADI. 
Le  tableau  1  donne  le  temps  de  calcul  par  point  et  par 
iteration  et  le  nombre  de  tableaux  3D  necessaires  pour 
le  stockage  de  la  matrice  implicite,  pour  les  decompo¬ 
sitions  ADI  et  DDLU. 

La  version  DDLU  est  2,3  fois  plus  rapide  que  la  version 
ADI,  et,  d’autre  part,  le  schema  LU  requiert  pres  de  1,5 
fois  moins  de  place  memoire  pour  le  stockage  des  ma¬ 
trices  implicites.  En  effet,  on  ne  stocke  en  chaque  point 
courant  qu’un  seul  vecteur  Dijk  alors  que  la  factorisa¬ 
tion  ADI  demande  la  reservation  memoire  en  chaque 
point  de  trois  blocs  et  De  plus, 

la  factorisation  ADI  fait  appel  a  des  tableaux  tempo- 
raires  lors  de  I’inversion. 


Algorithme 

Temps  CPU  (fis) 

Memoire 

ADI  Euler  3D 

49 

225 

DDLU  Euler  3D 

21 

155 

FIACRE  Mach  4.5. 


Fig. 13:  Comparaison  des  vitesses  de  conver¬ 
gence  des  algorithmes 

5.  Conclusion 

Trois  developpements  recents  dans  le  code  FLU3M  ont 
ete  presentes. 

Les  equations  d’Euler  instationnaires  (en  maillage  non 
deformable  mobile)  ont  ete  discretisees  avec  un  schema 
Runge-Kutta  implicite  d’ordre  2  en  temps  associe  aux 
flux  de  van  Leer.  Les  cas  de  validation  presentes,  2D 
et  3D,  montrent  la  precision  et  la  rapidite  de  la  meth- 
ode,  aussi  precise  qu’un  calcul  explicite,  mais  70  fois 
plus  rapide. 

L ’implementation  du  modele  k-e  de  Jones-  Launder, 
y  compris  pour  un  gaz  biespece,  a  ensuite  ete  decrite. 
Nous  utilisons  le  solveur  de  Roe  pour  resoudre  le  sys- 
teme  complet  des  equations  de  Navier-Stokes  couplees 
avec  les  equations  de  transport  pour  k  et  e.  Si  des  re- 
sultats  satisfaisants  ont  ete  obtenus,  notamment  sur  un 
cas  d’arriere-corps,  des  etudes  concernant  I’application 
du  modele  restent  a  effectuer;  en  particulier,  le  champ 
initial  des  variables  k  et  e  doit  etre  determine  pour 
commencer  le  calcul;  de  plus,  des  phenomenes  de  re- 
laminarisation  peuvent  apparaitre.  Pour  terminer, 
I’algorithme  implicite  de  decomposition  DDLU  per¬ 
met  de  reduire  les  couts  memoire  et  temps  de  calcul  a 
chaque  iteration.  Get  algorithme  pent  etre  etendu  aux 
equations  avec  modele  k-e, 

Remerclements:  Les  developpements  relatifs  au 
modele  k-e,  ainsi  que  ceux  relatifs  a  I’instationnaire, 
ont  ete  soutenus  par  Aerospatiale  Espace  et  Defense 
et  par  le  ONES.  Les  travaux  sur  I’algorithme  DDLU 
ont  ete  effectue  dans  le  cadre  de  la  these  de  D.  Darracq, 
stagiaire  CIFRE  ONERA-SNECMA. 


Tableau  1:  Comparaison  des  temps  de  calcul 
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Calculs  k-e 


Fig.  6  -  Tuyere  de  Delery:  lignea  iao-nombre  de  Mach 


P/fU 


Fig.  7  -  Tuyere  de  Delery :  profit  de  preaaion  a  la  paroi 


9.8 


Fig.  8  -  Arriere- corps:  lignea  iao-nombre  de  Mach 
AM  =  0,2 


Fig.  9  -  Arriere- corp a :  profit  d’energie  cineiique  de 
turbulence  a  x/R=0,59 


Fig.  10  -  Arriere- corps :  profit  de  preaaion  a  z/R=6 


Calculs  instationnaires 


Portance 


Moment 

- calcul  explidte(reference) 


Ponance 


Momem 


Fig.  1  -  Schema  impliciie  d’ordre  1  en  temps 
(cfl=400) 


Fig.  2  -  Schema  Runge-Kutia  implicite  d’ordre  2  en 
temps  (cfl=400j 


Fig.  3  -  Profil  de  sphere-cone 


Fig.  4  -  Lignes  iso-nombre  de  Mach  a  3  cycles 
AJllf  0,5 


Fig.  11  -  Fuselage  lenticulaire :  lignes  iso-nombre  de 
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Summary 

with  the  high  costs  associated  with 
flight  and  wind  tunnel  testing,  the 
computation  of  aircraft  store 
trajectories  is  becoming  more 
important  to  the  military 
establishment.  In  Canada,  the 
Department  of  National  Defence  (DND) 
requested  lAR  to  acquire/  develop 
the  necessary  tools  to  carry  out  the 
prediction  of  stores  on  release  from 
aircraft  -  particularly  the  DND's 
CF-18  aircraft.  After  debate  whether 
to  use  structured  Chimera  schemes  or 
unstructured  schemes,  lAR  decided  to 
use  the  latter  techniques  as  there 
was  already  a  development  program 
underway  in  that  field  of  research. 
lAR  had  already  demonstrated  that 
hybrid  (structured/  unstructured) 
grids  had  produced  successful 
results  and  decided  to  pursue  this 
approach  for  the  unsteady  3-D 
computations.  To  this  end,  a  study 
has  been  made  in  the  2-D  case  of  a 
'store'  moving  from  the  parent 
'body'.  Grid  generation  is  underway 
for  the  full  CF-18  aircraft  using  a 
commercial  code  and  several  simpler 
cases  have  been  gridded  and 
computations  made  in  a  steady  3-D 
environment . 

1 .  Introduction 

Accurate  prediction  of  the 
trajectory  of  a  store  released  from 
an  aircraft  is  critical  in  assessing 
whether  the  store  can  be  released 
safely.  The  trajectory  of  stores 
released  in  aircraft  flowfields  has 
always  been  difficult  to  predict.  A 
typical  wind-tunnel/  flight  test 
program  intended  to  ensure  that  the 
store  will  release  properly  is 
lengthy  and  costly.  It  may  involve 
20  flight  tests,  one  or  two  wind 
tunnel  entries,  and  extend  over  a 
period  of  several  years.  In  the 
event  of  an  improper  trajectory, 
pylon  and/or  attachment  point 
modifications  may  have  to  be  made 


resulting  in  more  flight  and  wind- 
tunnel  testing  (Ref  1). 

However,  with  the  advancement  of 
computational  fluid  dynamics  (CFD) 
techniques,  a  much  faster  prediction 
of  carriage  and  trajectory  data  is 
believed  to  be  possible.  In 
particular,  a  sufficiently  reliable 
computed  flowfield  data  could  reduce 
the  test  matrix  and  supplement  the 
measured  data  such  that  the 
additional  testing  could  be  reduced 
or  eliminated.  Further,  it  is 
anticipated  that  computed  flowfields 
could  serve  as  a  diagnostic  aid  in 
deciding  among  possible  solutions  to 
design  problems.  Both  multiblock 
structured  overlapping  (Chimera  - 
see  for  example  [2,3]),  and 
unstructured  grid  methods  ( for 
example  [4]  and  [5])  have  been  used 
to  solve  multi-component  and  moving- 
body  systems . 

Both  Chimera  and  unstructured 
methods  have  their  advantages  and 
disadvantages  and  after  careful 
consideration  lAR  decided  to  take 
the  unstructured  grid  route  for  its 
main  thrust  at  tackling  the  problem, 
one  of  the  main  reasons  being 
availability  of  codes.  Also  with 
multiblock  techniques  the  grid  cells 
tend  to  stay  small  and  very 
stretched  in  some  areas  remote  from 
the  aircraft  making  the  method  less 
efficient.  Several  commercial  codes 
were  at  first  considered  as  possible 
contenders  for  predicting  store 
release.  Most  of  them  were  rejected 
after  an  initial  survey  and  only  the 
two  codes  RAMPANT  [6]  and  FASTRAN 
[7]  finally  were  on  the  'short 
list'.  Several  test  cases  were  run 
on  the  short  listed  codes. 

After  in-house  evaluation  of  these 
codes  it  was  found  that  neither  was 
fully  satisfactory  and  attention  was 
turned  to  acquiring  only  a  suitable 
grid  generation  program.  Thus  lAR 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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evaluated  the  packages  I-DEAS  [8], 
FLITE3D  [9] (also  contains  a  solver) 
and  ICEM  [10]  and  eventually  came  to 
the  conclusion  that  ICEM  was  the 
best  package.  lAR  then  decided  to 
develop  its  own  3-D  Euler  solver 
from  the  existing  2-D  solver  which 
turned  out  to  be  not  too  time 
consuming.  Validation  of  this  3-D 
code,  called  FJ3SOLV,  is  covered  in 
the  paper  below  and  it  will  be  seen 
that  promising  results  are  obtained. 
Eventually  the  aim  is  to  develop 
this  3-D  solver  into  a  fully  time 
dependent  code  with  moving  store  and 
grid  but  in  the  meantime  a  study  in 
2-D  has  been  underway  and  is 
reported  in  the  last  section  below. 
This  2-D  study  will  be  beneficial  in 
the  3-D  development  as  various 
efficiencies  will  be  transferred  to 
the  3-D  code. 

In  the  first  section  some  background 
material  on  the  unstructured  grid 
developments  at  lAR  is  reviewed. 

Then  the  3-D  grid  generation  and 
solver  code  acquisitions/ 
developments  and  their  validations 
are  described.  Finally  the  unsteady 
2-D  code  development  is  covered. 

2 .  2-D  Steady  State  Code 
Developments 

A  fully  unstructured  Delaunay  grid 
generation  code  was  developed 
several  years  ago  and  is  reported  in 
Refs  11  and  12.  It  uses  the  standard 
Delaunay  triangulation  technique 
[13]  with  new  points  added 
continually  at  the  centroids  of 
existing  triangles  until  a  criterion 
of  required  grid  density  is 
fulfilled. 

The  Euler  equations  are  solved  using 
a  cell  centred  finite  volume 
technique  with  explicit  artificial 
viscosity  as  in  Ref  14  .  Standard 
acceleration  techniques  such  as 
local  time  stepping,  enthalpy 
damping  and  implicit  residual 
smoothing  are  used.  Solutions  of  the 
Euler  equations  using  these  grids 
were  obtained  for  several  airfoils 
and  showed  good  accuracy  compared  to 
standard  AGARD  test  cases  [15].  On 
advancing  to  Navier-Stokes  solutions 
and  trying  to  get  cells  very  close 
to  the  surface  within  the  boundary 
layer  it  was  found  that  the  grids 
became  of  poor  quality  even  for  wall 
function  type  of  grids,  for  example 
Fig  la.  Thus  it  was  decided  that  a 
more  satisfactory  grid  could  be 


obtained  by  using  structured  layers 
of  grid  near  the  surface  followed  by 
an  unstructured  grid  outside  these 
layers.  The  structured  grid  layers 
were  generated  using  advancing 
normals  with  some  averaging  to  avoid 
clashing  of  the  normals,  in  some 
cases,  as  they  advanced  from  the 
surface.  An  example  of  such  grids  is 
shown  in  Fig  lb  for  the  RAE  2822 
airfoil.  Having  generated  these 
grids  for  Navier-Stokes 
computations,  the  same  grids  were 
then  used  for  Euler  results.  These 
solutions  also  appeared  to  be  very 
accurate  as  shown  in  Fig  2a  for  the 
RAE  2822  airfoil  for  a  medium  grid 
of  60  points  on  each  of  the  upper 
and  lower  surfaces.  Similarly  a 
Navier-Stokes  solution  is  shown  in 
Fig  2b  and  further  results  were 
presented  at  the  ETMA  workshop  in 
Ref  16. 

Thus  this  hybrid  approach  was  one 
which  was  desirable  to  use  for 
stores  release  since  accurate 
solutions  had  been  obtained  in  the 
2-D  steady  version  of  the  code  as 
mentioned  above.  It  will  be  shown 
later  that  the  2-D  unsteady  version 
demonstrates  good  results.  Although 
it  is  hoped  to  eventually  use  a  3-D 
hybrid  grid  generator,  none  has  yet 
been  acquired. 

3 .  Review  of  3— D  Commercial 
Unstructured  Codes . 

Initially  it  was  planned  to  acquire 
a  commercial  code  for  both  the  grid 
generation  and  the  3-D  solver. 

Several  possible  codes  were  rejected 
after  a  preliminary  evaluation  made 
by  calling  users  of  these  codes. 

Codes  that  only  had  a  grid 
generation  capability  were  rejected 
as  we  wanted  the  whole  package 
including  the  solver  and  post 
processing.  Finally  the  codes 
RAMPANT  and  FASTRAN  were  selected 
for  in-house  evaluation  and  a  report 
on  these  codes  has  been  made  in  Ref 
17.  In  summary,  it  was  found  that 
FASTRAN  could  not  deliver  good 
Navier-Stokes  solutions  for  RAE  2822 
and  that  RAMPANT,  although 
reasonable  results  were  obtained  in 
some  cases,  was  not  robust  and  was 
very  slow  even  in  the  Euler  mode  of 
operation.  Later  the  FLITE3D  codes 
[9]  were  evaluated  but  these  were 
found  to  be  lacking  in  terms  of  pre- 
and  post-processing  and  user 
support . 
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Having  reached  the  point  of  not 
finding  a  complete  package  for  grid 
generation  and  solver,  we  then  had 
to  decide  whether  to  develop  our  own 
codes.  The  idea  of  developing  a  3-D 
grid  generator  was  not  relished 
whereas  the  code  to  carry  out  an 
unstructured  grid  Euler  solver  (and 
later  Navier-Stokes)  appeared  to  be 
quite  feasible.  Thus  lAR,  with  a 
view  to  first  acquiring  a  grid 
generation  package,  contacted 
vendors  and  evaluated  several 
unstructured  grid  generators 
including  I-DEAS  [8]  (mainly  used 
for  structural  analysis)  and  GFEM 
[18].  These  codes  were  eventually 
rejected  as  they  were  either  very 
cumbersome  to  use  or  were  very  slow 
in  generating  fairly  simple  grids. 

Finally,  the  code  ICEM  [10]  was 
evaluated  and  was  found  to  be  quite 
promising.  A  copy  of  this  software 
was  also  obtained  for  evaluation.  It 
has  a  good  user  interface, 
preprocessing  and  postprocessing. 

Its  CAD  software  can  build 
complicated  wire  frame  surface 
models  efficiently,  and  can  take 
point  data  in  PLOT3D  format,  and 
IGES  files  from  other  CAD  systems. 
This  grid  generation  package 
supports  the  point,  line  and  volume 
sources  for  density  control  and  can 
generate  3D  unstructured  meshes 
efficiently.  The  Octree  method  is 
used,  which  refines  the  grid  by 
subdividing  the  tetrahedron  into 
eight  smaller  tetrahedra  until  a 
satisfactory  grid  density  has  been 
reached.  Some  examples  of  these 
grids  are  shown  later. 

The  idea  of  generating  structured 
layers  of  tetrahedra  near  the 
surface  will  be  pursued  with  the 
vendor  ICEM,  or  lAR  may  develop  its 
own  capability  using  advancing 
normals  as  was  done  for  2-D.  In  the 
meantime  it  will  be  used  solely  in 
its  unstructured  form  which  may  be 
acceptable  for  Euler  solutions. 

4.  Development  of  a  3-D  Euler 
Solver  and  Validation 

Rather  than  trying  to  acquire  a 
commercial  3-D  solver,  lAR  decided 
it  was  more  suitable  to  develop  a  3- 
D  Euler  solver  from  our  existing  2-D 
solver  especially  since  the  code 
would  eventually  have  to  be  made 
into  an  unsteady  version.  To  make 
the  existing  2-D  code,  FJSOLV,  into 


a  3-D  version  FJ3SOLV  was  relatively 
easy  as  in  the  2-D  code  the  logic  is 
set  up  so  that  it  is  driven  by  edges 
of  a  triangle  with  the  flux  across 
the  edge  being  computed  once  and 
added/subtracted  to  the  total  flux 
balance  for  the  triangles  on  each 
side  of  the  edge.  The  same  principle 
was  used  for  3-D  but  now  the  edge  is 
a  'face'  of  a  tetrahedron.  In  the 
far  field  there  is  no  vortex 
correction  as  in  2-D  and  the  Riemann 
invariants  alone  are  used. 

To  validate  the  new  Euler  code 
FJ3SOLV,  we  first  considered  the  RAE 
2822  airfoil  spread  out  as  a 
straight  wing  between  two  solid 
walls.  A  boundary  condition  of  no 
normal  flow  was  imposed  at  the  walls 
so  that  the  flow  should  be  two- 
dimensional  with  no  variation  across 
the  span.  The  ICEM  grid  generation 
package  was  used  to  generate  a  grid 
and  two  views  are  shown  on  Fig  3a. 
This  grid  was  then  used  to  generate 
a  solution  shown  in  Fig  3b.  Note 
that  the  grid  was  not  refined  about 
the  shock  wave  at  about  70%  chord  on 
the  upper  surface  and  so  produced  a 
result  that  was  quite  smeared  out 
around  the  shock  wave.  On  refinement 
of  the  grid  around  the  shock,  shown 
on  Fig  4a,  a  much  improved  shock  was 
obtained  and  good  two  dimensionality 
was  shown  with  little  spanwise 
variation  in  the  pressure.  The 
accuracy  for  the  airfoil  pressure 
distribution  is  demonstrated  in  Fig 
4b  where  the  solution  at  one 
spanwise  station  (FJICEM),  obtained 
by  interpolation  from  nearby  values, 
is  compared  to  a  completely  2-D 
solution.  The  latter  solution  was 
obtained  with  the  2-D  solver  FJSOLV 
with  30  points  on  each  of  the  upper 
and  lower  surfaces  (called  FJDJ30  on 
the  figure);  it  was  also  run  without 
a  vortex  correction  and  with  roughly 
the  same  far  field  distance  as  in 
the  3-D  case.  The  solution  using 
FJ3SOLV  took  8  hours  on  the  SGI 
Power  Challenge  computer  and  used 
about  299,000  grid  cells. 

Having  achieved  success  with  the  '3- 
D'  airfoil  a  more  challenging  case 
of  some  practical  interest  was  next 
considered.  The  M-lOO  wing-body 
configuration  (ref  19)  had  been  used 
when  checking  the  RAMPANT  code.  It 
is  a  good  case  as  the  experimental 
data  is  quite  reliable  and  it  has 
been  used  as  a  test  case  by  Grumman 
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for  a  Navier-Stokes  code  evaluation 

[20] .  It  was  also  considered  to  be  a 
more  realistic  case  for  the  ICEM 
grid  generator  as  it  has  to  cope 
with  the  intersecting  surfaces  of 
the  wing  and  body.  Grids  obtained 
using  ICEM  are  shown  in  Fig  5a. 
Results  at  various  spanwise 
locations  are  shown  on  Fig  5b.  Here 
several  grids/codes  are  compared: 
FJ3SOLV  using  the  ICEM  grid 
(designated  Fjicem  on  the  figure), 
FJ3SOLV  using  the  grid  generated  by 
RAMPANT  software  (Fjrampant)  and 
lastly,  the  RAMPANT  grid  with  the 
RAMPANT  solution  (Rampant).  It  can 
be  seen  that  there  is  good  agreement 
to  the  experimental  data  with  the 
differences  being  typical  of  a  non- 
viscous  'Euler'  result  compared  to 
experiment .  Also  note  that  the 
present  results  as  compared  to 
RAMPANT  results  give  a  slight 
improvement  while  the  computer  time 
is  down  from  30  hours  to  6  hours. 

The  RAMPANT  grid  had  240,000 
tetrahedra  while  the  ICEM  grid  had 
250,000  tetrahedra. 

5.  Unsteady  2-D  Code 
Development  and  Validation 

The  existing  steady  state  2-D 
unstructured  grid  code  FJSOLV  was 
developed  into  an  unsteady  version 
so  that  moving  stores  could  be 
simulated  in  2-D  and  some  of  the 
problems  investigated  in  2-D  before 
proceeding,  at  some  later  date,  to  a 
moving  body  3-D  code.  Thus  we  can 
investigate  such  items  as  using  a 
'window'  around  the  store  to  keep 
the  grid  fixed  there  relative  to  the 
'store',  moving  the  grid  only  within 
a  second  'window'  so  that  not  all 
the  grid  is  moved,  grid  refinements 
and  grid  interpolation. 

For  moving  grids,  the  geometric 
conservation  law  (GCL)  must  be 
satisfied  in  order  to  be  consistent. 
This  law,  which  establishes  the 
relations  for  the  conservation  of 
surfaces  and  volumes  of  the  control 
cells,  plays  a  key  role  in  this  flow 
simulation.  If  this  law  is  violated, 
a  misrepresentation  of  the 
convective  velocities  is  encountered 

[21] .  For  domains  bounded  by  moving 
boundaries,  the  mesh  must  follow  the 
computational  domain  geometries. 
Usually  points  initially  on  the 
boundaries  stay  attached  to  those 
boundaries  at  the  same  relative 
locations,  as  is  done  here.  For 


inside  point  movement  two  approaches 
are  used,  the  first  uses  spring 
analogies  [22]  while  the  second 
computes  velocities  at  the  nodes  by 
some  kind  of  diffusive  process  and 
then  evaluates  the  displacements  as 
the  product  of  velocities  and  time 
step.  It  has  been  demonstrated  that 
the  first  approach  is  not  failure 
proof  [23].  The  second  approach 
seems  more  promising  and  is  used 
here.  The  velocity  of  an  inside 
point  is  computed  as  the  average  of 
the  velocities  of  the  surrounding 
points  with  the  velocities  of  the 
boundaries  points  as  limit 
conditions.  At  the  first  time  step, 
the  inside  velocities  are 
initialized  to  zero.  They  are  then 
successively  updated  by  a  series  of 
Jacobi  iterations.  This  process 
gives  a  velocity  distribution 
similar  to  that  obtained  by  a 
diffusive  operator.  For  motions  of 
big  amplitude,  since  the  velocities 
of  the  inside  points  are  smaller 
than  the  ones  on  the  boundaries,  the 
faster  moving  nodes  will  overlap  the 
slower  ones,  which  will  require 
local  reraeshing. 

The  new  code  was  first  tested  on  a 
standard  case  [24]  for  the 
oscillating  NACA0012  airfoil  with 
M=0.755,  aQ=2.51,  a  =0.16°  and 
reduced  frequency  0.0814.  The  grid 
generated  for  this  case  contained 
four  layers  of  structured  grid  as 
described  earlier.  Some  very  small 
cells  just  aft  of  the  structured 
zone  were  removed  using  interactive 
software  [25]  to  improve  grid 
quality.  Results  for  this  standard 
test  case  are  shown  in  Figs  6a  and 
6b  where  CN,  Cm  and  several  Cp  plots 
are  presented.  The  results  are 
consistent  compared  to  other 
theories  which  use  Euler  methods 
(for  example  Ref  26)  and  are  in  fair 
agreement  with  the  experimental 
data.  It  was  quickly  realized  that 
the  fourth  order  time  marching 
scheme,  as  used  in  [26],  was 
superior  to  the  first  order  scheme 
that  several  authors  are  still 
using,  both  in  terms  of  speed  and 
also  smoothness  of  solution.  The  CPU 
time  for  this  case  was  about  36 
hours  on  the  SGI  Power  Challenge 
computer;  this  is  quite  slow  mainly 
due  to  some  of  the  cells  being  very 
small.  This  computation  was  done 
with  a  window,  similar  to  that  used 
in  [27],  around  the  store  located  at 
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a  distance  of  0.03  (chord=l)  from 
the  airfoil.  Within  this  window  the 
grid  was  fixed  relative  to  the 
airfoil  and  movement  of  the  grid  was 
only  allowed  outside  of  the  window. 

Next  the  code  was  tested  on  a 
NACA0012  falling  in  free  air  with 
M=0.8,  a=0  and  a  downward  velocity 
of  0.08  (relative  to  unit 
freestream) .  The  trend  in  CN  with 
increasing  time  was  compared  to  the 
actual  steady  state  result  for  the 
equivalent  angle  of  attack.  This 
grid  was  completely  unstructured  and 
the  window  was  now  fixed  at  about 
1.4  units  from  the  body.  Fig  7a 
shows  the  initial  grid  and  also  the 
grid  after  a  plunge  of  about  1.6 
units  (for  a  chord  length  of  1). 

This  CN  development  is  shown  in  Fig 
7b  where  it  can  be  seen  that  the 
result  looks  quite  accurate  as  the 
normal  force  CN  is  tending 
asymptotically  to  the  true  steady 
state  value. 

The  next  computation  was  for  a  more 
realistic  (store)  type  of  body  such 
as  an  ogive-cylinder-ogive  as  shown 
in  Fig  8a  with  an  airfoil/pylon  as 
the  parent  body.  For  a  freestream 
Mach  number  of  0.4,  a  reduced 
frequency  of  0.8  and  a  maximum 
velocity  of  0.064,  this  'store'  was 
moved  down  and  up,  in  a  cyclic 
manner,  a  distance  of  0.16  units 
(based  on  an  airfoil/pylon  chord 
length  of  1)  to  check  physical 
consistency  of  the  results.  Figs  8b 
and  8c  show  the  CN  and  Cm 
developments  with  time  for  three 
cycles.  The  results  look  quite 
physical  as  the  lift  first  increases 
as  it  moves  downward  (seeing  an 
upwash  from  the  fluid),  then  as  the 
gap  increases  the  lift  decreases  as 
the  'channel'  effect  above  the  body 
is  becoming  less  noticeable.  When 
the  body  returns  upward,  the  lift  at 
first  decreases  as  the  body  sees  a 
downwash  from  the  fluid  but  finally 
increases  as  the  channel  effect  is 
stronger.  The  first  window  in  this 
case  was  set  at  a  distance  of  0.03 
from  the  body  which  basically 
covered  only  the  structured  layers 
of  the  grid.  A  second  window  for 
fixing  the  grid  was  also  set  around 
the  wing/pylon.  This  enables  good 
grids  to  be  maintained  near  the 
bodies  where  it  is  felt  to  be 
necessary  to  achieve  an  accurate 
solution.  Shown  in  Fig  8a  is  the 


initial  grid  before  the  store  starts 
to  move,  the  grid  at  the  bottom  of 
the  store's  cycle  and  the  grid  after 
one  complete  cycle.  A  third  window 
was  also  used  in  this  case  so  that 
the  grid  was  only  allowed  to  move 
within  a  distance  of  4  units  from 
the  centre  of  the  airfoil/pylon.  The 
grid  was  fixed  outside  this  window 
allowing  for  greater  efficiency  in 
grid  movement.  The  CPU  time  for  this 
case  on  the  Power  Challenge  was 
about  5  hours. 

This  is  the  current  status  of  the 
unsteady  development  of  the  program. 
Several  more  tests  will  be  performed 
to  check  accuracy  and  then  different 
schemes  for  moving  the  grid  and  for 
integrating  in  time  will  be  studied. 
Implicit  time  marching  schemes  will 
be  investigated  so  that  larger  time 
steps  can  be  taken.  These 
enhancements  will  be  very  useful  in 
the  future  development  of  the  3-D 
version  of  the  code. 

6.  Conclusions 

All  the  pieces  are  now  in  place  to 
complete  the  development  of  an 
unsteady  calculation  applied  to  the 
prediction  of  the  store  trajectory 
after  release  from  the  aircraft.  A 
suitable  3-D  grid  generator  has  been 
identified  in  ICEM  and  a  3-D  Euler 
solver  has  been  developed  in-house. 
To  optimize  the  development  of  the 
3-D  unsteady  version  of  the  final 
code  a  2-D  version  has  first  been 
developed  and  presented  here. 

Lessons  learned  from  this 
development  will  be  incorporated 
into  the  3-D  version  at  a  later 
date.  The  six  degrees  of  freedom 
(6DOF)  equations  defining  the 
motion,  given  the  aerodynamic  forces 
as  computed  from  the  Euler  code, 
will  be  incorporated  into  the 
package  to  provide  a  complete 
trajectory.  Another  future 
development  will  be  to  move  from  the 
Euler  formulation  to  a  Navier-Stokes 
one,  where  structured  grid  layers 
near  the  surface  will  be  especially 
beneficial . 

References 

1.  Fox,  J.H.,  Donegan,  T.L., 

Jacocks,  J.L.,  and  Nichols,  R.H. 
(1991):  "Computed  Euler  Flowfield 
for  a  Transonic  Aircraft  with 
Stores",  Journal  of  Aircraft, 

Vol.28,  pp. 389-396. 

2.  Lijewski  L.E.  and  Suhs  H.E. 


30-6 


'Time-Accurate  Computational  Fluid 
Dynamics  Approach  to  Transonic  Store 
Separation  Trajectory  Prediction'. 
Journal  of  Aircraft,  Vol  31,  No  4, 
July-Aug  1994. 

3.  Lijewski,  L.E.  and  Suhs,  N.E. 
(1992):  "Chimera-Eagle  Store 
Separation",  AIAA  92-4569-CP. 

4.  Ldhner,  R.  and  Baum,  J.  (1992): 
"Comparison  of  Wing/Pylon/Store 
Experiment  with  an  Euler  Finite 
Element  CFD  Code",  AIAA  92-4573. 

5.  Parikh  P.,  Pirzadeh  S.  and  Frink 
N.T.  'Unstructured  Grid  Solutions  to 
a  Wing/Pylon/Store  Configuration'. 
AIAA  Journal  Vol  31,  No  6,  Nov  1994. 

6.  Fluent  Inc.,  Centerra  Resource 
Park,  10  Cavendish  Court,  Lebanon, 
NH,  USA. 

7.  CFD  Research  Corporation,  3325 
Triana  Blvd. ,  Huntsville,  Alabama, 
USA 

8.  I-DEAS,  Structural  Dynamics 
Research  Corporation,  2000  Eastman 
Drive,  Milford,  Ohio,  USA. 

9.  FLITE3D,  Computational  Dynamics 
Research,  Innovation  Centre, 
University  College,  Singleton  Park, 
Swansea,  UK. 

10.  ICEM,  CFD  Engineering,  2600  ETNA 
Street,  Berkeley,  California,  USA. 

11. - Jones,  D.J.,  and  MacLeod,  B. 
'Solution  of  the  Euler  Equations 
using  Unstructured  Grids'.  Fourth 
Canadian  Aeronautics  and  Space 
Institute  Aerodyncimics  Symposium, 
Toronto,  May  1993. 

12.  Fortin  F.  and  Jones  D.J. 
'Solution  of  Compressible  Inviscid 
and  Viscous  Flows  around  Single  and 
Multi-Element  Airfoils  on 
Unstructured  Meshes'.  CFD  Society  of 
Canada  Conference,  June  1994. 

13.  Weatherill  N.P.  'The  Delaunay 
Triangulation  in  Computational  Fluid 
Dynamics',  Computers  and  Mathematics 
with  Applications,  Vol  24,  No  5-6, 
pp  129-150,  1992. 

14.  Jameson  A.,  Schmidt  W.  and 
Turkel  E.,  'Numerical  Solution  of 
the  Euler  Equations  by  Finite  Volume 
Methods  using  Runge-Kutta  Time 
Stepping  Schemes'.  AIAA  Paper  81- 
1259,  1981. 

15.  Yoshihara  H.  (1985),  'Numerical 
Solution  of  Two-Dimensional 
Reference  Test  Cases, '  in  Test  Cases 
For  Inviscid  Flow  Field  Methods, 
AGARD  AR-211. 

16.  Fortin  F.  and  Jones  D.J. 
'Unstructured  Grid  Solutions  using 
k-6  with  Wall  Functions '. Proceedings 
of  Workshop  on  Efficient  Turbulence 
Models  for  Aerodynamics  (Notes  in 


Numerical  Fluids),  Editors:  A. 
Dervieux,  J.  P.  Dusage,  L.  J. 
Johnston,  Vieweg,  to  be  published. 

17.  Fortin  F. ,  Hawken  D.F.,  Jones 
D.J.,  Symms  G.F.,  'A  Comparison  of 
Two  Commercial  Euler  and  Navier- 
Stokes  CFD  Codes',  CFD  Society  of 
Canada  Conference,  CFD  95,  June 
1995. 

18.  GFEM,  Electronic  Data  Systems 
Corporation,  Unigraphics  Division, 
13736  Riverport  Drive,  Maryland 
Heights,  Mo,  USA. 

19.  Carr  M.P.,  Pallister  K.C. 

(1984),  'Pressure  Distributions 
Measured  on  Research  Wing  MlOO 
Mounted  on  an  Axisyrametric  Body, '  in 
Experimental  Data  Base  for  Computer 
Program  Assessment,  AGARD  AR-138. 

20.  Marconi,  F. ,  Siclari,  M. , 
Carpenter,  G.,  Chow,  R. ,  'Comparison 
of  TLNS3D  Computations  with  Test 
Data  for  a  Transport  Wing/Simple 
Body  Configuration',  AIAA  Paper  94- 
2237,  1994. 

21.  Zhang  H.,  Reggio  M. ,  Trepanier 
J.Y.  and  Camarero  R.  'Discrete  Form 
of  the  GCL  for  Moving  Meshes  and  its 
Implementation  in  CFD  Schemes'. 
Computers  and  Fluids,  Vol  22,  No  1, 
pp  9-23,  1993. 

22.  Batina  J.T.,  'Unsteady  Euler 
Algorithm  with  Unstructured  Dynamic 
Mesh  for  Complex-Aircraft 
Aerodynamic  Analysis',  AIAA  Journal, 
Vol.  29,  No.  3,  March  1991,  pp.  327- 
333. 

23.  Chakravarthy  S.R.  and  Szema  K-Y. 
'Computational  Fluid  Dynamics 
Capability  for  Internally  Carried 
Store  Separation'.  Rockwell  Inti 
Corp  Report  SC-71039-TR,  Science 
Centre,  Thousand  Oaks,  CA. 

24.  AGARD  Compendium  of  Unsteady 
Aerodynamic  Measurements',  AGARD-R- 
702,  Data  Set  3,  1982. 

25.  Trepanier  J.Y.,  Yang  H. ,  'ADX: 
Algorithms  for  Adaptive 
Discretization  based  on  Triangular 
Grids ' ,  Technical  Report  EPM/RT- 
93/3,  Ecole  Polytechnique  de 
Montreal,  1993. 

26.  Gaitonde  A.L.  and  Fiddes  S.P.  'A 
three-dimensional  moving  mesh  method 
for  the  calculation  of  unsteady 
transonic  flows'.  Aeronautical 
Journal,  April  1995. 

27.  Singh  K.P.  et  al.  'Dynamic 
Unstructured  Method  for  Flows  Past 
Multiple  Objects  in  Relative 
Motion'.  AIAA  Paper  94-0058,  Jan 
1994. 


Fig  la.  Unstructured  Delaunay  Grid  obtained  for  RAE  2822 

Showing  poor  quality  of  grid. 
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Fig  lb.  Hybrid  (Structured/Unstructured)  Grid  for  RAE  2822 
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Fig  2a.  Euler  solution  obtained  with  the  Hybrid  arid. 

RAE  2822  airfoil  at  M=0.75,  0=30 

Fig  2b.  Ravier-Stokes  Solution  using  Hybrid  Grid. 

RAE  2822  airfoil,  M=0.734,  a=2.790,  Re=6.5E6 


Mew  of  coarse  mesh  for  RAE2822  wing 
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Fig  5b.  Euler  Solutions  for  the  M— 100  Wing— Body. 

Various  Spanwlse  locations.  M=0.8028,  a=2.8730 
Several  Computations  Shown . 


Fig  8b.  CN  Values  for  three  cycles  for  the  store  (top)  rig  8c.  Pitching  Moaent  Values  for  three  cycles  for  the  store  (top) 

airfoil/pylon  (bottom)  and  total  (middle)  airfoll/pylon  (bottom)  and  total  (middle) 
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1.  SUMMARY 

A  three  dimensional  finite  volume  scheme  is  presented.  The 
scheme  is  based  on  the  employment  of  hybrid  grids,  containing 
tetrahedral  as  well  as  prismatic  cells. 

The  application  of  hybrid  grids  offers  the  possibility  to  combine 
the  flexibility  of  tetrahedral  meshes  with  the  accuracy  of  regular 
grids.  An  algorithm  to  compute  an  auxiliary  grid  of  control  vol¬ 
umes  for  the  entire  computational  domain  was  formulated.  The 
dual  mesh  technique  guarantees  conservation  in  the  entire  flow 
field  even  at  interfaces  between  prismatic  and  tetrahedral  do¬ 
mains  and  enables  the  employment  of  an  accurate  upwind  flow 
solver.  Convergence  to  the  steady  state  can  be  accelerated  by  a 
multigrid  algorithm  based  on  the  agglomeration  of  control  vol¬ 
umes.  The  formulation  of  such  an  algorithm  is  presented. 

The  code  is  tested  on  several  viscous  and  inviscid  cases  for  tran¬ 
sonic  and  subsonic  flows. 


action.  Though  much  effort  has  been  spent  in  the  last  years  to 
develop  powerful  tools,  the  generation  of  appropriate  stmctured 
grids  for  complex  geometries  appears  to  be  much  more  time  con¬ 
suming  than  the  flow  simulation. 

A  possibility  to  circumvent  this  bottleneck  is  the  unstructured 
approach  [3, 4].  The  flow  simulation  is  performed  on  a  grid  con¬ 
sisting  of  tetrahedral  cells  instead  of  hexahedra.  As  tetrahedral 
cells  offer  a  high  flexibility  the  discretization  of  complex  three 
dimensional  domains  can  be  done  almost  automatically  [5],  with 
less  user  interaction  as  required  for  generating  structured  grids. 
The  weak  point  of  the  unstmctured  approach  is  the  generation 
of  grids  for  high  Reynolds  number  flows.  The  efficient  simula¬ 
tion  of  such  flows  requires  extremely  stretched  cells.  The  edges 
of  tetrahedral  cells  of  high  aspect  ratio  are  connected  under  very 
acute  angles.  This  may  cause  numerical  errors  when  the  fluxes 
for  comer  nodes  of  such  cells  are  evaluated.  Hence,  convergence 
and  even  solution  accuracy  can  be  deteriorated. 


2.  INTRODUCTION 

The  calculation  of  stationary  flow  fields  around  aircrafts  can  be 
regarded  as  one  of  the  major  tasks  of  CFD.  Due  to  the  progress 
made  in  the  development  of  high  performance  computers,  the 
simulation  of  flows  even  around  quite  complex  configurations 
has  become  feasible.  Therefore  CFD  methods  nowadays  have 
got  a  considerable  impact  on  the  aerodynamic  design  of  air¬ 
planes. 

One  of  the  first  requirements  to  be  met  by  the  applied  numeri¬ 
cal  method  is  related  to  the  problem  turn  around  time.  To  make 
a  scheme  usable  in  the  design  process,  it  has  to  fit  into  indus¬ 
trial  time  scaling.  Including  the  generation  of  appropriate  com¬ 
putational  grids,  the  solution  for  a  certain  problem  should  be  ob¬ 
tained  within  a  few  days  or  less. 

The  accurate  resolution  of  the  flow  field  in  the  vicinity  of  solid 
walls  has  a  considerable  impact  on  the  correct  prediction  of  aero¬ 
dynamic  forces  on  the  configuration.  Strong  solution  gradients 
normal  to  the  surface  occur.  A  simulation  of  such  flow  phenom¬ 
ena  requires  a  high  point  density  in  gradient  direction.  As  for  ef¬ 
ficiency  reasons  usually  a  lower  point  density  in  the  directions 
tangential  to  the  wall  is  utilized,  high  aspect  ratio  cells  are  most 
suited  for  the  flow  resolution  in  those  regions. 

One  class  of  schemes  widely  employed  in  practical  use  is  based 
on  structured  grids,  consisting  of  blocks  of  hexahedral  cells  [  1  ]. 
As  it  is  feasible  to  stretch  hexahedral  cells  in  one  or  two  direc¬ 
tions  without  losing  grid  quality,  stmctured  grids  are  appropriate 
for  the  simulation  of  high  Reynolds  number  flows.  The  major 
drawback  of  stmctured  schemes  is  related  to  the  generation  of 
suited  grid  for  complex  geometries.  Grid  generation  normally  is 
an  iterative  process  [2]  that  requires  a  high  level  of  user  inter- 


A  compromise  between  stmctured  and  unstmctured  schemes  is 
the  application  of  hybrid  schemes  [6].  Hybrid  grids  consist  of 
regular  cells  with  edges  exclusively  normal  and  tangential  to 
the  surface  of  the  geometry.  In  some  distance  from  the  surface, 
where  the  viscous  impact  on  the  flow  has  almost  vanished,  tetra¬ 
hedral  cells  are  employed  to  discretize  the  physical  space  be¬ 
tween  the  regular  domains  and  the  outer  boundaries.  Consider¬ 
ing  the  shape  of  the  cells  in  the  regular  part  there  are  several 
possibilities.  The  surface  discretization  with  quadrilateral  ele¬ 
ments  leads  to  hexahedral  cells,  while  a  surface  triangulation  re¬ 
sults  in  prismatic  ceils.  Due  to  the  higher  flexibility  of  triangles 
compared  to  quadrilaterals,  a  higher  level  of  automization  can 
be  achieved  when  using  prismatic  cells. 

The  aim  of  this  work  is  to  develop  of  a  numerical  scheme  that 
offers  the  possibility  to  employ  pure  tetrahedral  and  prismatic 
grids  as  well  as  hybrid  grids  consisting  of  prismatic  cells  in  the 
vicinity  of  solid  walls  and  of  tetrahedral  cells  in  the  rest  of  the 
flow  domain. 


3.  GOVERNING  EQUATIONS 

The  Navier  Stokes  equations  for  the  three  dimensional  case  can 
be  written  in  conservative  form  as 


where 
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is  the  vector  of  the  conserved  quantities.  V  denotes  an  arbitrary  From  equation  (1)  the  temporal  change  of  the  conservative  vari- 

control  volume  with  the  boundary  and  the  outer  normal  n.  ables  W  can  be  derived  as; 


The  flux  tensor  F  is  composed  of  the  flux  vectors  in  the  three 
coordinate  directions: 

F  =  F  -e^c  +  G  •ey  +  H  -e^ 

with  Cx,  Cy  and  being  unit  vectors  in  the  coordinate  directions. 
The  flux  vectors  F,  G  and  H  may  be  divided  into  its  convective 
and  viscous  parts  as 

P=F‘^  +  F'',  G  =  G^'  +  G^  H  =  H‘^  +  H'' 

with 
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The  change  of  the  flow  conditions  in  a  certain  control  volume  V 
is  given  by  the  flux  over  the  control  volume  boundary  dV  related 
to  the  size  of  V.  For  a  control  volume  fixed  in  time  and  space, 
equation  (5)  can  be  written  as: 

=  (6) 
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where 

Vx  =  UOxx  +  VZyx  -f  WTjj 

Fy  =  UXxy  +  VCTyy  +  VVTjry 
Fj.  =  Mtyj.  +  VT,.^  -h 


The  normal  and  tangential  stresses  depend  on  the  derivatives  of 
the  velocity  and  on  the  dynamic  viscosity  p: 
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The  viscosity  p  can  be  calculated  employing  the  Sutherland’s 

\+S 

.  =  (3, 

where  Sc  is  a  constant  depending  on  the  free  stream  temperature 

T^: 

110, 4K 


The  pressure  is  calculated  by  the  equation  of  state 
P  =  {k-1)p{E - — 1  . 


with  Q  representing  the  fluxes  over  the  boundaries  of  the  control 
volume.  If  the  boundary  is  divided  into  n  faces,  Q  is  given  by 

Q=lQi  =  liQi  +  Qi) 

(=1  i=l 

where  Ql  and  g-  denote  the  inviscid  and  the  viscous  flux  over 
the  respective  face. 

4.  DATA  STRUCTURE 

The  dual  mesh  technique  is  perfectly  suited  to  be  utilized  in 
a  scheme  that  is  based  on  hybrid  grids.  From  the  initial  grid 
an  auxiliary  grid  of  control  volumes  is  generated.  For  a  vertex 
based  scheme,  where  the  flow  variables  are  stored  in  the  nodes  of 
the  initial  grid,  each  node  is  surrounded  by  a  control  volume.  The 
boundaries  of  the  control  volumes  are  determined  by  the  mid¬ 
points  of  cells,  cell  faces  and  edges  of  the  initial  grid.  This  strat¬ 
egy  results  in  non  overlapping  auxiliary  cells  that  fill  the  phys¬ 
ical  space  without  gaps.  Figure  1  depicts  such  an  auxiliary  grid 
(dashed  lines)  for  an  initial  hybrid  grid  (solid  line).  As  it  can  be 
seen  from  the  figure,  the  auxiliary  grid  is  defined  even  at  inter¬ 
faces  between  the  different  cell  types.  Hence,  focusing  on  the 
fluxes  crossing  the  boundaries  of  the  control  volumes,  conser- 
vativity  can  be  guaranteed  in  the  entire  flow  domain. 


Fig.  1  :  Mesh  of  control  volumes  for  a  two  di¬ 
mensional  hybrid  grid 


For  each  initial  cell  contributions  to  the  auxiliary  grid  have  to 
be  determined.  As  this  can  be  done  cell  by  cell  without  infor¬ 
mations  about  neighboring  cells,  the  auxiliary  grid  can  be  eval¬ 
uated  within  one  loop  running  over  all  cells.  Therefore,  the  eval¬ 
uation  of  the  auxiliary  grid  is  quite  cheap  in  terms  of  computa¬ 
tional  time. 


2 


(4) 
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The  control  volumes  are  composed  of  several  faces.  Each  edge 
connecting  two  initial  nodes  is  related  to  one  face  of  the  auxil¬ 
iary  grid.  While  in  two  dimensions  the  edges  are  shared  by  two 
cells,  in  three  dimensions  the  number  of  cells  sharing  one  edge 
is  not  constant.  In  figure  2  an  edge  connecting  the  nodes  Pq  and 
Pi  is  shown.  It  is  surrounded  by  four  tetrahedral  cells.  For  each 
cell  two  triangles  form  one  part  of  the  face,  so  it  is  composed  of 
eight  triangles.  As  the  size  and  the  orientation  of  triangles  can 
be  described  by  normal  vectors  the  sum  of  the  normal  vectors 
describes  the  size  and  the  orientation  of  the  entire  face.  The  re¬ 
sulting  vector  is  also  related  to  the  respective  edge. 


Fig.  2  :  Face  of  a  three  dimensional  control  vol¬ 
ume 

The  fluxes  along  an  edge  between  two  nodes  can  be  interpreted 
as  fluxes  crossing  the  auxiliary  grid  face  related  to  the  edge.  The 
fluxes  between  the  grid  points  are  computed  within  one  sweep 
over  all  edges.  Informations  that  are  required  to  compute  the 
fluxes  and  to  adjoin  them  to  the  respective  nodes  are: 

•  Geometrical  coordinates  of  the  grid  nodes 

•  Edge  to  node  pointer 

•  Components  of  the  face  vectors 


with 


and 


pa 

pa 

pan 

pau 

pav 

+ 

pav 

paw 

paw 

.  PaH 

L 

paH 

pa 

pa 

pau 

pau 

pav 

- 

pav 

paw 

paw 

paH 

L 

paH 

0 

Sx-PF 
SyPF 
'  Pf 
0 


Fig.  3  :  Control  volumes  around  neighboring 
nodes  Pq  and  P] 


The  speed  of  sound  a  can  be  obtained  from  the  relation 

p 

Mp  denotes  the  advection  Mach  number  at  the  cell  face: 


Hence,  both  the  description  of  the  grid  and  the  computation  of 
the  grid  fluxes  are  based  on  the  edges.  Informations  the  initial 
grid  cells  are  not  required  any  more.  This  strategy  leads  to  a  very 
efficient  memory  allocation  of  less  than  100  real  variables  per 
node  and  a  good  vectorization  of  the  flow  solver. 

The  preprocessing,  including  the  determination  of  the  control 
volumes  of  the  auxiliary  grid  as  well  as  the  components  of  the 
face  vectors,  has  to  be  executed  before  the  flow  calculation 
starts. 


Mf  =  MI+M'^  (9) 

where  the  split  Mach  numbers  Mp!'^  are  defined  as 
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5.  SPATIAL  DISCRETIZATION 
5.2  Calculation  of  convective  Terms 

The  edge  based  data  structure  described  in  section  4  forms 
the  basis  for  the  employment  of  the  accurate  AUSM  upwind 
scheme,  as  presented  by  Liou  and  Steffen  in  [7].  Considering  an 
edge  connecting  two  nodes  Pq  and  P\ ,  as  illustrated  in  figure  3, 
the  inviscid  flux  2gj  over  Face  F  can  be  interpreted  as  a  sum  of 
a  Mach  number  weighted  average  of  the  left  (L)  and  the  right 
[R)  state  at  a  face  F  and  a  scalar  dissipative  term: 


G|j,  =  |5o,il 


+p 


Herein  M  denotes  the  Mach  number  of  the  flow  normal  to  the 
cell  face. 

The  pressure  p/r  at  the  cell  face  is  calculated  in  a  similar  way  as 

Pf  =  Pl  +  Pr  (10) 

where  pP!’^  denote  the  split  pressure 
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The  coefficient  O  f  controls  the  dissipation  of  the  scheme.  In  the 
original  scheme  of  Liou,  O/r  is  set  to 

O/r  =  \Mf\  .  (11) 

For  small  Mach  numbers  the  dissipative  character  vanishes  since 
also  ‘Ff  becomes  small.  In  order  to  prevent  the  disappearance 
of  the  dissipation  for  small  Mach  numbers  4>/r  is  determined  as 
proposed  by  Kroll  and  Radespiel  in  [8]. 

The  values  left  and  right  of  the  face  F  are  taken  directly  from 
Po  and  P\  for  first  order  calculations.  For  second  order  accurate 
calculations  the  independent  flow  variables  are  linearly  recon¬ 
structed  on  the  control  volumes  around  Pq  and  P\ .  For  the  con¬ 
trol  volume  of  node  Pq  it  reads: 

Mz.-«o  +  Vi<o-^^o,i  ■  (12) 

The  gradient  Vmq  of  a  variable  u  is  obtained  by  employing  a 
Green-GauBformula: 

Vmo=  +  (13) 


5.3  Calculation  of  Viscous  Terms 

The  determination  of  the  viscous  terms  is  also  performed  edge¬ 
wise.  The  obtained  fluxes  are  related  to  the  nodes  associated 
with  the  respective  edge.  For  an  edge  connecting  the  nodes  Pq 
and  P]  with  the  face  vector  5o_i  (figure  3)  one  obtains: 
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with  14,  Vy  and  as  described  in  section  5.1. 

The  derivatives  of  a  flow  variable  u  have  already  been  ob¬ 
tained  for  the  second  order  discretization  as  described  in  sec¬ 
tion  5.2.  They  are  the  components  of  the  gradient  vectors  Vil  — 
{ux,Uy,u^Y .  The  face  values  are  determined  by  an  arithmetic 
averaging  of  the  respective  values  in  the  nodes  Pq  and  Pi . 

6.  TEMPORAL  DISCRETIZATION 

The  temporal  variation  of  the  flow  quantities  can  be  written  in 
general  form  for  a  node  Pq  as: 


where  Qq  is  the  volume  of  the  dual  cell  around  (Vq  and  Sq  j  is  the 
normal  vector  of  the  dual  mesh  face  F  as  shown  in  figure  4. 


Fig.  4  :  Face  of  a  three  dimensional  control  vol¬ 
ume 

Near  shocks  the  values  on  the  edges  have  to  be  limited  to  avoid 
overshoots.  The  limiting  is  done  by  an  minimum/maximum  clip¬ 
ping  like  it  is  proposed  by  Barth  in  [9].  If  a  reconstructed  value 
at  any  face  of  the  control  volume  exceeds  the  minimum  (or  max¬ 
imum)  of  the  values  given  by  node  Pq  and  the  surrounding  nodes 
P]  6  (figure  4),  the  gradient  Vmq  is  scaled  by  a  factor  ©q,  so  the 
reconstructed  value  becomes  equal  to  the  minimum  (or  maxi¬ 
mum)  of  the  nodes  Po  .5: 


^^OJirn  —  ^^0  ’ 

(14) 

with 
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Herein 

and  m"""  denote  the  maximum/minimum  of  the  val- 

ues  of  u  at  the  nodes  Pq  5  and  «,7,  denote  the  reconstructed  value 
at  the  faces  of  the  control  volume  between  Pq  and  P,-.  The  values 
at  the  faces  are  reconstmcted  as  described  in  equation  (12). 


^Wo+Ro  =  0  ■  (16) 

at 

A  comparison  with  equation  (6)  gives  for  the  residual  Rq: 

Ro  =  ^-Qo  ■  (17) 

The  integration  in  time  is  performed  utilizing  an  explicit  Runge- 
Kutta  scheme,  as  described  by  Jameson  in  [10]: 

jy(0)  =  w{n) 

\y(i)  iT(o)-aiAtpW 

:  .  (18) 

iv(o) 

w{n+\)  =  irt") 

Within  the  framework  of  this  paper  a  three  step  scheme  is  em¬ 
ployed  with  the  coefficients 

ai=0.15,  a2  =  0.4  and  a3  =  1.0  . 


For  the  control  volume  surrounding  node  Pq  in  figure  4  the  con¬ 
vective  time  step  Atp  and  the  viscous  time  step  At^  have  to  be 
determined.  The  resulting  time  step  can  be  written  as: 


Ato  =  CFL 


H+K 


(19) 


with  CFL  being  the  Courant  number.  The  convective  time  step 
AIq  can  be  calculated  as: 


‘^0  - 


where  denotes  the  maximum  eigenvalue  of  the  flux  Jacobian. 
It  can  be  determined  as  a  integration  over  the  surface  of  the  con¬ 
trol  volume: 


6  ^ 

^0  =  X  l^0('  X  ■5o,/l  +«0!  •  \^0,i\  (20) 

/=1 


with  So  i  representing  the  face  vectors  of  the  control  volume  face 
for  the  ith  neighbor  of  Pq  and  vg;  the  face  velocity  vector. 
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Following  [  1 1  ]  the  viscous  time  step  Atg  has  to  be  scaled  with  a 
factor  a:"  ==  0.25: 


^0 


Employing  an  integration  around  the  control  volume,  the  vis¬ 
cous  eigenvalue  Aq  can  be  written  as: 


(21) 


The  selection  of  the  start  node  and  some  strategy,  which  of  the 
neighboring  control  volumes  are  to  be  fused  with  the  control  vol¬ 
ume  of  the  start  node,  is  the  only  possibility  to  control  the  qual¬ 
ity  of  the  coarse  grid.  It  appears  that  the  best  grid  quality  can 
be  obtained  when  the  agglomeration  is  marching  along  coars¬ 
ening  fronts  throughout  the  grid.  Furthermore,  nodes  lying  on 
solid  walls  should  be  preferred  to  remain  in  the  coarse  grid.  So 
the  highest  priority  to  become  the  next  start  node  will  be  given 
wall  nodes  lying  on  the  coarsening  front. 


7.  MULTIGRID  ALGORITHM 

7.1  Multigrid  Strategies  for  Unstructured  Grids 

The  acceleration  of  the  convergence  is  necessary  for  the  sim¬ 
ulation  of  high  Reynolds  number  flows.  A  very  powerful  tool 
that  can  be  utilized  with  an  explicit  time-stepping  scheme  is  the 
multigrid  method. 

Focusing  on  unstructured  grids,  there  are  several  approaches 
for  the  formulation  of  a  multigrid  algorithm.  The  differences 
between  these  approaches  lay  in  the  strategy  of  generating  the 
coarser  grids. 

One  frequently  utilized  method  employs  independent  grids.  A 
set  of  successively  coarser  grids  are  generated  around  the  re¬ 
spective  geometry  independently  from  each  other  [12,  13].  As 
the  grids  are  not  nested,  expensive  search  algorithms  are  re¬ 
quired  to  determine  the  operators  needed  to  transfer  the  flow 
variables  and  the  residuals  between  the  different  grids.  Another 
drawback  is  the  limitation  of  the  cell  size.  It  should  not  exceed 
the  size  of  details  of  the  geometry,  since  otherwise  the  correct 
representation  of  the  surface  can  not  be  guaranteed. 

The  same  holds  for  a  different  strategy,  the  use  of  telescoping 
points.  In  this  case  certain  points  are  selected  to  remain  in  the 
coarser  grids.  These  points  are  then  reconnected  using  some  tri¬ 
angulation  algorithms  [14].  The  preprocessing  described  above 
has  to  be  performed  for  each  level  again.  Since  one  has  to  select 
existing  points  of  the  fine  grid  to  become  coarse  grid  points  the 
quality  of  the  coarse  grids  is  worse  than  the  quality  of  indepen¬ 
dently  generated  coarse  grids.  For  hybrid  grids  this  approach  is 
not  suited  as  different  algorithms  would  have  to  be  used  to  select 
and  reconnect  the  points  in  the  prismatic  and  tetrahedral  regions. 

The  agglomeration  of  control  volumes,  as  described  e.g.  by 
Lallemand  et  al.  in  [15]  or  Venkatakrishnan  and  Mavriplis  in 
[16],  can  be  regarded  as  a  special  case  of  the  second  approach. 
Certain  points  of  the  fine  grid  remain  in  the  coarse  grid  as  well, 
but  the  control  volumes  of  the  fine  grid  nodes  are  fused  together 
in  order  to  form  the  coarse  grid  control  volumes.  Since  the  focus 
is  on  the  control  volumes,  the  surface  representation  is  guaran¬ 
teed  by  definition.  For  a  hybrid  scheme  using  the  dual  mesh  tech¬ 
nique  this  approach  is  perfectly  suited.  As  in  the  solution  process 
the  shapeof  the  initial  grid  cells  is  not  important  in  the  agglomer¬ 
ation  procedure  either.  One  problem  that  may  occur  is  the  quality 
of  the  coarser  grids,  as  one  has  to  deal  with  points  that  exist  also 
in  the  finest  grid. 

7.2  Agglomeration  Process 

The  agglomeration  process  starts  with  the  choice  of  a  start  node, 
that  will  remain  in  the  coarser  grid.  The  control  volume  of  the 
start  node  is  fused  with  control  volumes  of  neighboring  nodes. 
After  having  agglomerated  the  control  volumes,  the  process 
starts  again  with  the  choice  of  a  new  start  node.  So  agglomera¬ 
tion  of  control  volumes  is  a  greedy  process,  that  is  not  expensive 
in  terms  of  calculation  time. 


A  simple  and  perfectly  working  strategy  is  to  fuse  all  control  vol¬ 
umes  of  neighboring  nodes  that  are  not  agglomerated  yet  with 
the  control  volume  of  the  start  node.  Anyway,  one  can  think 
about  some  more  sophisticated  algorithms,  as  to  fuse  only  the 
n  nearest  neighbors,  with  n  w  8,  or,  as  Venkatakrishnan  and 
Mavriplis  propose  in  [17],  to  maximize  the  ratio  of  volume  and 
surface  of  the  coarse  grid  control  volumes,  what  results  in  a  kind 
of  semi  coarsening  for  Navier-Stokes  grids. 


Fig.  5  :  Coarse  grid  obtained  by  agglomeration 
of  control  volumes 


As  depicted  in  figure  5  the  coarse  grid  consists,  as  the  fine  grid, 
of  a  set  of  nodes  surrounded  by  control  volumes  and  connected 
by  edges.  Each  edge  is  related  to  one  face  of  the  auxiliary  grid 
of  control  volumes.  The  only  difference  to  the  fine  initial  grid  is 
that  the  edges  do  not  have  to  form  any  specially  shaped  cells  any 
more.  The  informations  needed  to  describe  the  coarser  grid  are 
determined  directly  from  the  fine  auxiliary  grid. 


Fig.  6:  Agglomeration  of  control  volumes 

Figure  6  illustrates  the  agglomeration  of  several  fine  control  vol¬ 
umes  to  a  coarse  control  volume  around  node  Pq.  The  size  of  the 
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new  control  volume  is 


(22) 

/=0 

with  Vj  1  being  the  size  of  control  volume  i  in  the  grid  k-l. 


Fig.  7  :  Determination  of  coarse  grid  face  vectors 

As  presented  in  figure  7,  the  normal  vector  ^  of  the  auxiliary 
grid  face  related  to  the  edge  between  node  Pq,!:  and  node  Pg  in 
the  coarse  grid  is  the  sum  of  all  face  vectors  related  to  edges  be¬ 
tween  children  of  Pq  and  Pg,  in  this  case: 

=  ‘^(2,ll),H  +‘^(2,12),<:-I  +‘^(4,ll),/:-l 


The  obtained  coarse  grid  has  got  the  same  properties  as  the  fine 
grid.  Hence,  the  governing  equations  can  be  discretized  on  the 
coarse  grid  employing  identical  algorithms  as  on  the  fine  grid. 
Furthermore,  the  coarse  grid  control  volumes  can  be  agglomer¬ 
ated  again  using  the  same  strategies  as  for  the  agglomeration  of 
the  fine  grid  control  volumes.  In  this  way  a  set  of  grids  can  be 
created  easily  based  on  the  fine  grid. 


7.3  Transfer  of  Flow  Variables 

The  multigrid  iteration  starts  with  the  performance  of  one  time- 
step  on  the  finest  grid.  The  time  step  on  a  fine  grid  k  -  1  gives 
the  solution  .  This  solution  is  transferred  to  the  next  coarser 
grid  k  with  a  suited  transfer  operator 


As  the  physical  position  of  coarse  grid  nodes  is  identical  to  the 
position  of  the  nodes  in  the  grid  -  1 ,  the  transfer  of  flow  vari¬ 
ables  from  the  fine  grid  to  the  coarser  grid  is  just  an  injection  of 
the  respective  values: 


7.4  Evaluation  of  Restriction  Operator 

Following  [10]  the  Forcing-Function  can  be  formulated  as: 

=  (24) 

with  Rk{Wk)  being  the  residuals  obtained  respectively  to  equa¬ 
tion  (17)  for  the  solution 

The  restriction  operator  /[_,  ^  depends  on  the  relations  between 
the  grids  k  -  \  and  k.  As  the  physical  space  of  the  coarse  grid 
control  volume  around  node  Pq  r-  in  figure  6  is  identical  to  the 
space  of  the  children  of  in  the  finer  grid,  the  residuals  of  all 


children  have  to  be  summed  up.  The  contributions  are  weighted 
with  respect  to  the  size  of  the  children  cells: 


\k  = 


i=\ 


■  (25) 


In  this  equation  Vj,e.v  denotes  the  sum  of  the  volumes  of  the  chil¬ 
dren  of  Po^i.  According  to  equation  (22)  equals  the  coarse 
grid  control  volume  Vg  ^  around  Pg  so  equation  (25)  also  be 
written  as: 

i  1  (^i- 1 )  •  ^i,k- 1  -  kk )  •  \k+ 1  • 

1=1 


As  stated  in  equation  (17)  the  product  of  residual  R  for  any  node 
P  and  the  volume  V  of  the  control  volume  equals  the  fluxes  Q 
crossing  the  boundary  of  the  control  volume.  Therefore  one  can 
write: 

1=1 


As  the  fluxes  between  two  fine  grid  control  volumes  that  are 
fused  together  cancel  each  other,  the  sum  term  in  this  equation 
denotes  the  flux  over  the  coarse  grid  control  volume  achieved  by 
a  fine  grid  discretization.  The  forcing  function  can  be  interpreted 
as  the  difference  between  the  fluxes  obtained  by  a  fine  grid  dis¬ 
cretization  and  the  fluxes  achieved  by  a  coarse  grid  discretiza¬ 
tion  for  a  coarse  grid  control  volume. 


The  temporal  discretization  is  performed  as  described  in_section 
6,  while  the  residuals  Rk  are  replaced  by  the  expression  P<.  -FP(.. 
Equation  (18)  then  reads; 

wO)  =  wf -a,At(P4{lvf  )  +  P*) 

^  wf'^  -a.MR^iwl^^-k  +  h) 

as  the  basis  for  the  next  coarser  grid  k-\-\: 

—  jy(") 

^k+\  -  ^k 


7.5  Determination  of  Prolongation  Operator 

After  having  determined  the  solution  for  a"  grids,  correc¬ 
tions  coming  from  the  coarse  grids  k+  \  ...m  are  transferred  to 
the  grid  k  employing  suited  transfer  operators.  For  this  grid  one 
obtains  the  corrected  solution  as: 


while  for  the  coarsest  grid  m  one  can  write: 


(26) 

(27) 


Figure  8  shows  a  fine  auxiliary  grid  k  and  a  coarser  grid  k-\-\. 
The  nodes  P]  and  P5  ig  are  coarse  grid  nodes.  The  corrections 
for  node  Pj  the  correction  coming  from  grid -F  1  are  computed 
as; 

^o,(t:+i,M  “  ^o,i+i  “^oi+i  ■ 
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Fig.  8  ;  Evaluation  of  coarse  grid  corrections 


where  is  the  corrected  value  for  grid  k+\.  Since  the  po¬ 
sition  of  node  Pq  is  identical  on  both  grids  k  and  /c  -f  1 ,  one  can 


write; 


0,k+\ 


(29) 


what  leads,  together  with  equation  (23),  to: 


w(^)  _  wi")  A-C  , 

^o,k  -%,k+'-o,(k+\,k)  ■ 

For  fine  grid  nodes  which  are  also  existing  in  the  coarser  grid, 
the  corrections  directly  added. 


are  connected  to  form  prismatic  cells  in  the  lower  part  of  the  grid 
and  tetrahedral  cells  in  the  upper  part. 

The  flow  is  coming  from  the  left  side  parallel  to  the  boundary 
planes.  In  rear  part  of  the  lower  plane  a  no  slip  plate  is  located. 
As  it  can  be  seen  from  the  pressure  distribution  on  the  boundary 
planes  in  figure  9  the  beginning  of  the  plate  is  characterized  by 
a  flow  stagnation. 


Fig.  9:  Hybrid  grid  for  flat  plate  and  isobars 


The  node  Pj  in  figure  8  does  not  exist  in  the  coarser  grid  k  +  ]. 
The  control  volume  of  this  node  has  been  agglomerated  with  the 
control  volume  around  Pq.  If  the  corrections  are  assumed  to  be 
constant  over  the  coarse  grid  control  volume,  one  can  write: 


In  order  to  create  a  tetrahedral  grid  the  prismatic  cells  are  sub¬ 
divided.  Hence,  the  point  distribution  is  identical  in  both  grids. 
Figure  10  presents  the  tetrahedral  grid  and  the  respective  solu¬ 
tion. 


_  T^(«) 


i:i-nj+co.(k+i,k) 


A  higher  accuracy  can  be  obtained  if  the  corrections  are  recon¬ 
structed  linearly  over  the  coarse  grid  control  volumes.  The  re¬ 
construction  is  similar  to  the  reconstmction  of  flow  variables 
as  described  in  section  5.2.  Using  the  values  in  the  neighboring 
control  volumes,  a  correction  gradient  VQ)  can  be  deter¬ 
mined  for  the  control  volume  of  PQ^k+ 1  • 


1  -  1  - 

^Co,{k+\,k)  -  y - ■  ^i^0,{k+\,k)  +  Q,(k+\,k))  ■ 

^0,k+l  ,_5  ^ 

(30) 

with  ,(.+  1  representing  the  normal  vector  related  to  the  edge 
between  node  Pq  and  Pj.  The  correction  in  node  P2  can  then  be 
obtained  as: 


^2,{k+l,k)  -  Q,(A:+I,i)  ’'^0,2 

where  Vo,2  denotes  the  vector  from  Pq  to  P2. 


When  the  solution  also  on  the  finest  grid  is  corrected,  the  next 
iteration  n  starts  with: 


Fig.  10:  Tetrahedral  grid  for  flat  plate  and  iso¬ 
bars 


8.  NUMERICAL  RESULTS 
8.1  Laminar  Flow  over  a  Flat  Plate 

The  simulation  of  a  laminar  flow  over  a  flat  plate  was  used  to  val¬ 
idate  the  formulation  of  the  viscous  term  evaluation.  Figure  9  de¬ 
picts  boundary  planes  of  the  three  dimensional  hybrid  grid.  The 
grid  consist  of  three  layers  with  60x40  points  each.  The  points 


In  figure  1 1  the  convergence  history  is  presented.  For  the  calcu¬ 
lation  on  the  hybrid  grid  a  convergence  of  2.5  orders  of  magni¬ 
tude  is  obtained  after  about  1 600  iterations.  The  calculation  was 
performed  in  the  single  grid  mode  in  order  to  make  it  compara¬ 
ble  to  the  results  obtained  on  a  pure  tetrahedral  grid.  As  one  can 
see  from  figure  1 1  the  convergence  is  worse  for  the  tetrahedral 
case.  This  may  be  due  to  the  disturbance  of  the  solution  caused 
by  the  diagonal  edges  near  the  wall.  Furthermore,  because  of  the 
higher  number  of  edges  more  computational  work  is  required  for 
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Fig.  11:  Convergence  history  for  flat  plate 


each  time  step  than  in  the  hybrid  case. 

Figure  12  depicts  a  comparison  at  different  points  between 
the  Blasius  solution  and  the  computed  solution  for  a  subsonic 
(Ma„„  =  0.5)  laminar  flow  with  a  Reynolds  Number  of  5000.  The 
solution  is  almost  identical  for  both  the  tetrahedral  and  the  hy¬ 
brid  case. 


Fig.  12:  Comparison  of  the  computed  solution 
with  analytic  solution 


8.2  Inviscid  Flow  around  ONERA  M6  Wing 

A  three  dimensional  test  case  for  the  scheme  is  the  inviscid  flow 


Fig.  13:  Hybrid  grid  around  ONERA  M6  wing 


around  an  ONERA  M6  Wing.  The  prismatic  part  of  the  grid,  as  it 
can  be  seen  from  figure  1 3 ,  has  got  an  0-Topology.  It  consists  of 
seven  layers  of  prismatic  cells.  The  triangular  cell  faces  are  lo¬ 
cated  on  the  wing  surface,  while  on  the  symmetry  plane  quadri¬ 
lateral  faces  are  visible.  Outside  the  prismatic  region  the  space 
is  discretized  with  tetrahedral  cells. 

Figure  14  depicts  the  solution  obtained  on  the  hybrid  grid  for  an 
incidence  of  3.06“  and  an  Mach  number  of  0.84.  The  character¬ 
istic  ?i-shock  that  is  visible  on  the  upper  wing  surface  is  captured 
within  two  or  three  cells.  No  oscillations  occur  at  the  interface 
between  the  prismatic  and  the  tetrahedral  domains. 


Fig.  14:  Isobars  of  transonic  flow  around 
ONERA  M6  wing 

On  the  same  grid  also  a  subsonic  flow  with  the  free  stream  Mach 
number  of  Ma„.  =  0.5  and  an  incidence  of  a  =  3 .0“  is  simulated. 
In  figure  1 5  the  respective  solution  is  presented. 


Fig.  15:  Isobars  of  subsonic  flow  around 
ONERA  M6  wing 


The  effect  of  the  convergence  acceleration  of  the  multigrid  algo¬ 
rithm  is  presented  in  figure  1 6.  For  the  multigrid  calculation  con¬ 
vergence  of  five  orders  of  magnitude  is  obtained  after  about  890 
seconds  of  computational  time,  while  on  the  single  grid  the  so¬ 
lution  has  converged  less  than  two  orders  of  magnitude  in  1500 
seconds. 
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Fig.  16:  Convergence  history  for  the  simulation 
of  subsonic  flow  around  ONERA  M6  wing 


9.  CONCLUSIONS 

A  finite  volume  scheme  based  on  hybrid  grids  is  presented.  The 
employed  grids  consist  of  prismatic  cells  near  body  surfaces  and 
tetrahedral  cells  connecting  the  prismatic  domains  and  the  outer 
boundaries.  The  use  of  prismatic  cells  offers  the  possibility  to 
resolve  viscous  dominated  flows  such  as  boundary  layers  effi¬ 
ciently  and  accurately  by  applying  high  aspect  ratio  cells  in  the 
respective  areas.  Due  to  the  use  of  tetrahedral  cells,  grids  be¬ 
come  quite  flexible  and  the  generation  of  grids,  even  for  com¬ 
plex  configurations,  is  relieved  considerably  compared  to  struc¬ 
tured  approaches. 

In  the  preprocessing  an  auxiliary  mesh  of  control  volumes  is 
computed  from  the  initial  grid.  The  auxiliary  mesh  covers  the 
entire  computational  domain  and  can  be  used  in  both  the  tetra¬ 
hedral  and  the  prismatic  domains.  In  the  flow  solver  part  of  the 
scheme  an  edge  based  data  structure  is  utilized,  so  the  cell  struc¬ 
ture  given  by  the  initial  grid  becomes  unnecessary.  The  feasibil¬ 
ity  of  employing  hybrid  grids  even  for  three  dimensional  flow 
calculations  are  presented. 

The  multigrid  algorithm  based  on  the  agglomeration  of  control 
volumes  is  a  natural  extension  of  the  dual  mesh  technique.  It  fits 
perfectly  to  the  edge  based  data  structure  and  results  in  a  small 
memory  requirement. 

The  calculation  of  inviscid  fluxes  are  demonstrated  to  be  effi¬ 
cient  and  accurate.  Shocks  are  captured  nicely  by  the  employed 
upwind  flow  solver.  Also  the  formulation  to  calculate  the  vis¬ 
cous  fluxes  has  proved  its  accuracy.  At  the  interface  between  the 
prismatic  and  the  tetrahedral  region  in  some  cases  wiggles  in  the 
flow  solution  occur.  Those  wiggles  will  be  subject  to  more  de¬ 
tailed  investigations. 

Though  the  multigrid  formulation  works  nicely  for  the  case  pre¬ 
sented  here,  it  still  have  to  be  improved,  as  the  gain  for  cases  with 
high  aspect  ratio  cells  is  less  than  one  would  expect.  In  order  to 
enable  the  simulation  of  viscous  flows  also  around  three  dimen¬ 
sional  geometries  the  next  step  will  be  the  implementation  of  a 
suited  turbulence  model.  With  the  improvement  of  the  multigrid 
algorithm  even  the  simulation  of  high  Reynolds  number  flows 
are  expected  to  become  feasible  for  complex  geometries,  like 
flapped  wings  or  complete  aircraft  configurations. 

Finally  a  grid  adaption  algorithm,  either  based  on  local  refine¬ 


ment  by  cell  division  or  on  global  remeshing  will  be  formulated. 
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SIMULATION  DU  MOUVEMENT  RELATIF  DE  CORPS  SOUMIS 
A  UN  ECOULEMENT INSTATIONNAIRE 
PAR  UNE  METHODE  DE  CHEVAUCHEMENT  DE  MAILLAGES 


P.  Brenner 

Adrospatiale  Espace  &  Defense 
BP2  78133  Les  Mureaux  CEDEX,  France 


RESUME 

Nous  prdsentons  une  m6thode  adapt6e  ^  la  simulation 
num6rique  des  largages  d’etages  de  fusee  en 
presence  de  contraintes  a^rodynamiques. 

Une  technique  de  chevauchement  de  maillage 
conservative  est  utilis^e  pour  simuler  le  d6placement 
des  difKrentes  parties  en  mouvement  du  aux  efforts 
a6rodynamiques  et  propulsifs. 

Les  Equations  d’Euler  en  multi-gaz  compressible 
sont  r6solues  au  moyen  d’une  discretisation  du  type 
Volumes  Finis  non  structuree  (tdtraedres,  prismes  et 
hexaMres).  Les  caracteristiques  de  Pecoulement  sont 
localisees  au  centre  de  gravity  de  chaque  maille.  Le 
schema  numerique  en  espace  est  decentre  et  du 
second  ordre  de  precision  sur  les  flux.  II  est  base  sur 
I'algorithme  de  Godounov. 

La  methode  d'integration  temporelle  adaptative 
mise  en  oeuvre  permet  de  simuler  des  ecoulements 
fortement  instationnaires  avec  deplacement  de  chocs 
forts  tout  en  limitant  le  cout  des  calculs  puisque  les 
largages  peuvent  durer  plusieurs  secondes. 

Le  choix  des  algorithmes  utilises  confere  au  code 
robustesse  et  precision  bien  que  la  discretisation 
spatio-temporelle  soit  non  reguliere  (puisque  le  pas  de 
temps  est  different  pour  chaque  maille,  que  les 
maillages  sont  non  structures  et  qu'ils  se 
chevauchent). 

ABSTRACT 

A  computational  method  for  the  simulation  of  rocket 
stages  separations  under  aerodynamical  and 
propulsive  loads  is  presented. 

To  simulate  the  motion  of  bodies,  a  conservative 
overlapping  grid  technique  is  used. 

The  flow  solver  is  based  on  a  cell  centred  Finite 
Volume  formulation  on  unstructured  grids  (made 
of  tetraedra,  prisms  and  hexaedra). 

The  Euler  equations  with  mixing  gases  are  solved 
through  a  second  order  upwind  scheme  using  the 
Godunov  algorithm  to  compute  the  numerical  fluxes. 
To  integrate  equations  in  time,  a  temporal  adaptive 
algorithm  is  used  since  the  real  duration  of  the 
simulated  phenomena  is  long.  It  saves  computer  time 
and  leads  to  accurate  simulations  of  unsteady 
phenomena  like  acoustic  waves  and  shocks 
displacements. 


Despite  the  unregular  spatio-temporal  discretisation 
(since  time  steps  are  different  in  each  cell,  since 
meshes  are  unstructured...),  the  algorithms  used 
associate  accuracy  with  robustness. 

INTRODUCTION 

Les  m6thodes  de  calcul  d'6coulements  instationnaires 
autour  de  corps  mobiles  sont  d'un  int&et  certain  pour 
simuler  le  largage  d'6tages  de  fus6es  vides. 

En  effet,  les  moyens  d'essais  susceptibles  de  permettre 
de  telles  simulations  sont  trfes  difficiles  It  mettre  en 
oeuvre: 

-  rScoulement  externe  est  fortement  supersonique 
done  la  taille  de  la  veine  de  soufflerie  sera  faible, 

-  les  moteurs  continuent  a  djecter  des  gaz  dont  la 
pression  statique  est  grande  par  rapport  k  la  pression 
externe,  ce  qui  conduit  des  Statements  de  jet 
importants  capables  de  provoquer  le  blocage  de  la 
veine, 

-  les  gaz  propulsifs  sont  thermodynamiquement  trfes 
diffdrents  de  fair  done  il  faut  simuler  aussi  cette 
difference  de  compositions, 

-  enfin,  il  faut  assujettir  le  mouvement  des  corps  aux 
forces  exerc6es  en  temps  rdel,  ce  qui  suppose  un 
systeme  complexe  permettant  de  peser  correctement 
ces  efforts  puis  de  les  interpreter  pour  modifier  la 
position  des  mobiles. 

Ce  dernier  point  semble  tres  contraignant  car  la  taille 
de  la  veine  est  limitee.  Ainsi,  il  ne  faut  pas  que  le 
systeme  en  question  soit  trop  encombrant  et  son  temps 
de  r6ponse  doit  etre  trbs  bref  car  la  dur6e  de 
fonctionnement  simulde  est  proportionnelle  I'&helle 
mise  en  oeuvre  (e'est  ^  dire  qu'un  largage  durant  une 
seconde  en  realite  durera  cinq  centibmes  de  seconde 
pour  un  moyen  d'essai  a  I'dchelle  un  vingtifeme). 
Lorsque  les  phenomenes  6tudi6s  sont  rfiellement 
instationnaires,  il  semble  done  plus  realisable 
d'utiliser  une  approche  essentiellement  numdrique 
comme  nous  I'avons  fait  avec  le  code  FLUSEPA  (rdf. 
7  et  8). 

Le  choix  des  formulations,  des  schdmas  numdriques  et 
des  algorithmes  utilises  sont  consdeutifs  aux 
contraintes  rencontrees  pour  ce  type  de  simulations: 

1-  la  gdomdtrie  des  dtages  pent  etre  complexe  mais 
surtout,  la  gdomdtrie  de  I'inter  dtage  est  toujours 
complexe  (prdsence  d'equipements,  de  systdmes  de 
separation...), 


Paper  presented  at  the  AGARD  FDP  Symposium  on  "Progress  and  Challenges  in  CFD  Methods  and  Algorithms” 
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32-2 


2'  r^coulement  varie  tres  brutalement  du  haul 
supersonique  au  bas  subsonique  et  le  rapport  de 
pression  rencontre  dans  les  chocs  forts  pent  atteindre 
plusieurs  milliers, 

3-  le  mouvement  relatif  des  difKrents  corps  est 
totalement  quelconque  (rotation  complexe,  translation 
importante...), 

4-  r&oulement  comme  la  position  des  Stages  pent  etre 
rapidement  6volutif  et  les  ph6nombnes  acoustiques  ou 
de  propagation  de  chocs  sont  souvent  pr6pond6rants 
quant  k  I'dvolution  du  champ  a6rodynamique. 

La  premibre  contrainte  nous  a  amen6  h  utiliser  une 
formulation  non  structure  qui  pr^sente  une  grande 
souplesse  d'un  point  de  vue  ergonomique  pour  la 
ddfmition  des  maillages. 

La  seconde  nous  a  conduit  au  choix  d'une  mdthode  de 
volumes  finis  du  type  Godounov  (d'ordre  deux)  qui 
est  trbs  robuste  et  prdcise.  Notons  ,que  dans  la 
formulation  utilis6e,  les  caract6ristiques  de 
r6coulement  sont  localises  au  centre  de  gravity  des 
mailles. 

La  troisibme  61imine  toute  technique  utilisant  les 
d6formations  de  maillage.  En  effet,  lorsque  les 
mouvements  sont  importants  et  quelconques,  le 
r6sultat  d'une  deformation  peut  conduire  k  une 
modification  locale  des  cellules  de  controle  tellement 
forte  que  le  vrillage  peut  retoumer  les  mailles  jusqu'h 
I’obtention  de  volumes  negatifs. 

Quant  aux  m6thodes  utilisant  des  remaillages 
adaptatifs,  elles  nous  ont  sembl6  trop  lourdes  et 
contraignantes  d'un  point  de  vue  instationnaire, 
lorsque  Ton  d6sire  assurer  la  conservativite  d'un 
systbme  (ou  I'accroissement  entropique  par  exemple). 
Nous  avons  done  opt6  pour  une  technique  de 
recouvrement  de  maillage  conservative.  Pour  tenir 
compte  du  mouvement  relatif  des  maillages  associfis 
aux  corps  mobiles,  nous  utilisons  une  formulation  des 
flux  Euler-Lagrange  mixtes  (A-L-E)  qui  simplifie  la 
m6thode  et  surtout  assure  la  conservativitd  du 
systbme.  En  effet,  bien  que  les  maillages  soient 
rigides,  cette  technique  permet  de  travailler  dans  un 
seul  r6f6rentiel  contrairement  aux  formulations 
Euleriennes  pures  qui  ndeessitent  I'emploi  d'un 
r6f6rentiel  par  maillage  puis  I'introduction  de  forces 
d'entrainement  qui  sont  trait^es  comme  des  termes 
source  nuisant  k  la  cc«iservativit6  globale. 

Notons  que  pour  une  mdthode  d6centr6e  de  calcul  des 
flux.  I'A-L-E  ne  modifie  que  d'une  fagon  mineure 
I'algorithme  car  il  suffit  de  prendre  en  compte  la 
vitesse  des  faces  lors  du  d&entrage.  Enfin,  I'interface 
existant  entre  maillages  est  traitee  comme  les  faces 
d'une  quelconque  maille  sans  fiaire  intervenir  de 
changement  de  r6ferentiel. 

Quant  h  la  demiere  contrainte,  elle  nous  a  conduit  a 
filiminer  les  mfithodes  d'intfigration  implicites  qui 
dtouffent  une  grande  partie  de  I'acoustique  et,  au 
mieux,  6talent  les  chocs  qui  se  propagenL  Nous  avons 
done  mis  au  point  une  methode  explicite  d'int^gration 
permettant  toutefois  de  tenir  compte  des  caractferes 
sp6cifiques  de  I'^coulement  local:  lorsque  les 
phenomenes  sont  rapides  dans  une  maille,  le  nombre 


d'it^rations  sur  cette  meme  maille  est  important, 
sinon,  il  est  faible.  II  s'agit  d'une  mdthode 
d'int6gration  temporelle  adaptative  qui  peut  etre 
consid6r6e  comme  une  technique  consistante  et 
conservative  de  pas  de  temps  locaux. 

Dans  la  description  qui  suit,  nous  insisterons  sur  les 
problfemes  de  conservativite,  de  consistance  et  de 
stabilite  qui  ont  conditionnS  le  choix  des  algorithmes 
utilises. 


La  methode  des  volumes  finis  (F.V.)  repose  sur  la 
resolution  des  equations  sous  forme  integrale,  e'est  h 
dire  que  Ton  fait  un  bilan  des  valeurs  conservatives 
sur  un  element  de  contrfile.  Notons  que.  en  toute 
rigueur,  le  bilan  doit  etre  verifie  quel  que  soit 
reiement  de  controle  considdre  appartenant  au 
domaine  etudie. 

1-  EQUATIONS  GENERALES 

Les  equations  d'Euler  sous  forme  integrale  en  multi- 
gaz  compressibles  pour  les  ecoulements 
tridimensionnels  et  lorsque  I'eiement  de  controle  est 
mobile,  peuvent  se  mettre  sous  la  forme  suivante: 


Ou,  SQ  est  la  frontibre  qui  entoure  I'eiement  de 
controle  Q,  p  est  la  masse  volumique,  V  est  la  vitesse 
du  fluide  dans  le  referentiel  Galiieen  de  calcul ,  U  est 
la  vitesse  de  la  frontiere  60,  Etest  I'energie  totale 
specifique,  Cv  la  chaleur  specifique  h  volume  constant 
et  N,  le  nombre  de  moles  par  unite  de  masse. 

La  premiere  equation  est  relative  k  la  variation  du 
volume  de  controle.  Puisque  les  maillages  sont 
rigides,  elle  n'est  utile  qu'au  niveau  de  I'interface  entre 
maillages  comme  nous  le  montrerons  par  la  suite. 

Les  deux  equations  sur  Cv  et  N  sont  prises  en  compte 
pour  simuler  le  melange  de  gaz  supposes  parfaits  et 
thermodynamiquement  parfaits.  Cette  modeiisation 
tres  simple  n'est  correcte  que  si  le  milieu  est  non 
reactif. 

2-  SOLVEUR  AERODYNAMIQUE 

Bien  qu'il  soit  difficile  de  decoupler  la  discretisation 
spatiale  de  la  discretisation  temporelle,  nous  allons 
proceder  ainsi  afin  de  mettre  en  lumibre  le 
cheminement  que  nous  avons  suivi. 
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2.1-  Discretisation  spatiale 


Calcul  des  flux 

Approximons  I'enveloppe  6Qi  de  r616ment  i  par  un 
polype  k  N  faces  planes  not6es  Sy  orientdes  de  i 
vers  ses  voisins  j. 

La  discrdtisation  la  plus  naturelle  provient  de 
Godounov  (Rdf.  1):  il  s'agit  de  considdrer  des  dials 
constanls  par  morceaux  sur  chaque  maille,  alors,  la 
mdthode  de  calcul  des  flux  numdriques  sur  chaque 
interface  consiste  k  rdsoudre  le  probldme  de  Riemann 
ainsi  posd. 

Plusieurs  techniques  pour  le  rdsoudre  d'une  fa9on 
approchde  onl  did  proposdes. 

A  noire  avis,  les  probldmes  qu'elles  suscitenl  lani  du 
poinl  de  vue  du  manque  de  fiabilild  (flux  de  Roe, 
d'Osher...)  que  de  la  viscosild  numdrique  importante 
(flux  de  Van  Leer...)  les  rendenl  peu  ^iltrayantes  en 
comparaison  de  la  mdthode  exacte  de  Godounov  (Rdf. 
n°2).  Bien  qu'elle  ait  la  rdputation  d'dtre  couteuse 
puisque  itdrative,  noire  expdrience  montre  que,  pour 
un  niveau  comparable  d'optimisation  sur  calculateur 
vectoriel  (CRAY),  ralgorithme  de  Godounov  n'est  au 
plus  que  de  20%  plus  cher  que  celui  de  Osher. 

Pour  loutes  ces  raisons,  nous  avons  opld  pour 
ralgorithme  original  de  Godounov. 

Notons  que  nous  avons  dlimind  les  mdthodes  dites 
centrdes  car  elles  ndcessitent  I'introducdon  d'un  terme 
de  viscosild  artificielle  paramdtrable  el  done  ne  sont 
gdndralement  pas  utilisables  en  "botte  noire". 

Etudions  la  pidcision  des  ^proximations  F.V.: 


soil  (2)  F  = 


n  ^ 

p 

pV 

pEt 

pCv 

pN 


(U  —  V)  continue  ddrivable 


alors  (3)  jj  F.dd  =  JJJdiv(F)3x 
do.  ^ 


soil  (4)  F(M)  =  F(M)  +  0(h”  )  une  approximation 
du  flux  exact  ^  I'ordre  n  en  M,  h  dtant  une  dimension 
caraetdristique  de  Q  dont  le  volume  est  dgal  co. 


Comme  F  est  une  fonction  lisse  on  peut  dtudier  la 
prdcision  de  la  diserdtisation  au  centre  de  gravitd  G  de 
Q  lorsque  I'on  utilise  les  flux  d'ordre  n. 

Tous  c^culs  fails : 


(5)  -  JjEaa  =  div(Fo)  +  e(h2)  +  e(h"~b 

“an 

Notons  que,  sous  certaines  conditions  d'approximation 
el  de  configurations  gdomdtriques  particulidres,  le 
second  terme  de  troncature  gagne  un  ordre  de 
prdcision  et  que,  en  dehors  du  point  G,  le  premier 
terme  chute  d  I'OTdre  1. 

Nous  voyons  done  que  le  schdma  que  nous  utilisons, 
d'ordre  1  pour  les  flux  est  en  gdndral  d'ordre  zdro  done 
inconsistant  au  sens  des  diffdrences  finies  sur  les 
maillages  quelconques  et  en  particulier  en  non 
stnicturd. 

Passage  k  I'ordre  2  en  espace 
Afin  d'assurer  la  consistance  du  schdma,  il  faut  done 
que  les  dtats  de  part  et  d'autre  des  interfaces  Sy  soient 
au  moins  calculds  d  i'ordre  2. 

Pour  ce  faire,  nous  utilisons  I'approche  M.U.S.CX. 
(Rdf.  3):  elle  consiste  d  reconstruire  lindairement  les 
variables  p,  pV,  P  ,Cv  et  N  sur  chaque  cellule. 

Alors  les  dtets  ndeessaires  d  la  rdsolution  du  probldme 
de  Riemann  sont  du  second  ordre  done  les  flux 
calculds  sont  bien  d'ordre  2. 

Notre  schdma  d'intdgration  contient  un  point  par 
face,  il  faut  done,  pour  prdserver  I'ordre  de 
prdcision,  que  ce  point  soit  impdrativement  situd 
au  centre  de  gravitd  de  la  face. 

Quant  aux  calculs  des  gradients,  nous  utilisons  une 
mdthode  de  moindres  carrds  dont  le  support 
repose  sur  les  voisins  principaux  (e'est  h  dire  ceux 
qui  ont  une  face  en  commun  avec  i'didment 
considdrd).  De  cette  fa^on,  nous  obtenons  des 
gradients  "centrds"  dont  la  prdcision  est  au  moins 
du  premier  ordre  (sur  les  maillages  rdguliers  ou 
non),  ainsi  la  reconstruction  est  bien  du  second 
ordre. 

Nous  utilisons  pour  plus  de  souplesse  des  maillages 
non  structurds  constituds  de  tdtraedres,  de  prismes,  de 
pyramides  et  d'hexaddres:  ce  dernier  type  d'dldments 
permet,  lorsque  les  maillages  sont  rdguliers, 
d'obtenir  une  prdcision  d'ordre  2  dans  les  zones 
"importantes",  les  autres  types  servent  k  faire  du 
"remplissage".  De  ce  fait,  lorsque  les  faces 
quadrangulaires  (des  hexaddres  par  exemple)  ne  sont 
plus  planes,  il  faudrait  deux  points  d'intdgration  par 
face  pour  assurer  la  consistance.  Nous  envisageons 
cette  modification  ou  schdma  a  court  terme. 

Limitation  des  gradients 

L'approximation  spatiale  prdeddente  est  consistante 
malheureusement,  son  intdgration  en  temps  pose 
quelques  difficultds: 

1  -  pour  un  schdma  d'intdgration  Euler  explicite, 
I'analyse  de  Fourier  monodimensionnelle  (de 
I'dquation  de  transport  lindarisde)  ddmontre 
I'instabilitd  de  la  mdthode  pour  les  grandes  longueurs 
d'onde  (Rdf.  4). 
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2  -  quel  que  soil  le  schema  d'integration,  la  methode 
est  fortement  oscillante  a  proximit6  des  zones  de 
discontinuity  (chocs,  variations  importantes  de  taille 
de  mailles...). 

La  solution  la  plus  simple  au  premier  probleme 
consiste  ^  mettre  en  oeuvre  un  schema  temporel  plus 
yiabory  (schyma  de  Heun  explicite,  Euler  implicite...). 

Pour  rysoudre  le  second  point  dur,  par  contre,  la 
solution  que  nous  utilisons  est  apparentye  a  celle  de 
Van  Leer  qui  consiste  ^  limiter  les  pentes  comme  suit: 


de  Godounov,  en  multidimensionnelle  pour  des 
maillages  de  paraliyiypipedes  ryguliers. 

Grace  a  ces  conditions,  nous  retrouvons  facilement  les 
rysultats  ytablis  en  monodimensionnel  pour  des 
maillages  ryguliers,  lorsque  Ton  intygre  par  un  Euler 
explicite: 

-  la  mythode  de  Godounov  du  premier  ordre  est  stable 
y  CFLygale  1, 

-  le  limiteur  minmod  est  stable  k  CFL  ygale  2/3, 

-  le  premier  limiteur  de  Van  Leer,  qui  correspond  k  la 
condition  (9c)  est  stable  ^  CFL  ygale  1/2... 


Considerons  pour  simplifier,  le  cas  d'une  yquation  de 
convection  linyaire  multidimensionnelle  a  la  vitesse 

C  (constante)  de  la  variable  scalaire  a. 

soil  (6)  ^  +  C.Grad(a)  =  0 

que  Ton  intygre  en  temps  sur  la  maille  i,  par  une 
mythode  k  un  pas,  aprys  discrytisation  du  type 
volumes  finis,  sous  la  forme: 


(7) 


Bien  que  le  raisonnement  prycydent  soit  issu  de  la 
linyarisation  d'une  yquation  scalaire,  I'intyret 
primordial  de  cet  ensemble  de  contraintes  provient  du 
fait  que: 

1-  il  est  utilisable  en  multidimensionnel, 

2-  il  est  applicable  y  de  nombreux  schymas  en  temps, 

3-  il  n'est  pas  liy  au  type  de  reconstruction, 

4-  il  est  local,  done  il  autorise  I'ytude  de  la  stability 
des  mythodes  y  pas  de  temps  locaux. 

Dans  notre  code,  nous  utilisons  le  limiteur 
enrresnondant  y  CFL  ygal  1/2  en  limitant 
globalement  le  gradient  (sur  chaoue  yiyment  et  pout 
ehaniie  variahlel  en  le  multipliant  par  un  coefficient 
qui  permet  de  satisfaire  (9a).  (9b)  et  (9c), 


Oil  At  reprysente  la  durye  comprise  entre  les  instants  n 
et  n+1  et  d-  la  valeur  de  a  qui  dytermine  le  flux  au 

centre  de  gravity  de  la  face  Sy  entre  n  et  n+1. 

On  dysire  cryer  un  schyma  localement  a  variation 
bomye  e'est  y  dire  vyrifiant  la  contrainte  suivante: 

(8)  =  min(a^ )  <  <  max(a^ )  - 

mm  j  j  1  J  J 

Nous  remarquerons  qu'un  tel  schyma  est  positif,  et 
aprys  quelques  manipulations,  on  en  dyduit  les 
conditions  suffisantes  mais  non  nycessaires: 


(9a)  (a„„-af)(^-l)>a"-njn(dj.) 

(9b)  (aJ'-a„J(^-l)>m^x(d.j)-a|' 
(9c) 


Avec 

(9d)  CFL  =  ^— X 
2“  j 


C.S.. 
IJ 


Nous  noterons  au  passage  que  I'emploi  d'un  tel 
limiteur  peut  parfois  (rarement  si  le  maillage  est 
rygulier)  faire  chuter  la  prycision  des  flux  y  I'ordre  1. 
Par  exemple  pour  le  limiteur  correspondant  y  CFL 
ygal  1/2,  I'ordre  2  n'est  effectif  que  lorsque  le  polybdre 
maximal  ayant  pour  sommet  les  centres  des  voisins  j 
de  i  contient  y  la  fois  les  centres  de  gravity  des  faces 
Sji  et  leur  symytriques  par  rapport  au  centre  de  gravity 
G  de  ryiyment  i  considyre. 

Le  cas  bidimensionnel  ci-dessous  permet  d'illustrer 
cette  condition:  le  polygone  maximal  est  matyrialisy 
par  le  triangle  (1,2,3),  le  centre  de  gravity  de  la  face 
Sgi  et  son  symytrique  le  point  (a)  sont  bien  contenus 
dans  ce  polygone,  par  contre,  bien  que  le  centre  de 
gravity  de  Sg3  verifie  la  condition,  son  symytrique,  le 
point  (c)  ne  la  vyrifie  pas  et  pour  Sg2.  e'est  le 
contraire,  seul  le  symytrique  (b)  est  dans  le  triangle 
(1,2,3).  Cette  configuration  de  maillage  peut  done 
renie  le  schyma  numerique  inconsistant  du  fait  de  la 
limitation  globale. 


Cette  formulation  du  CFL  coincide  bien  avec  la 
formulation  usuelle  monodimensionnelle  et  avec  celle 


3 
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Nous  n'en  ferons  pas  la  demonstration  mais,  nous 
pouvons  dire  que  cette  condition  est  suffisante  sans 
etre  toujours  n6cessaire:  elle  ddpend  du  champ 
a6rodynamique  6tudi6. 

Enfin,  la  limitation  que  nous  utilisons  ne  garantit  pas 
la  monotonie  du  sch6ma  en  monodimensionnel. 
Lorsque  Ton  ^tudie  par  exemple  le  cas  d'une  detente 
dans  le  vide,  cette  caract6ristique  peut  devenir 
p6nalisante:  il  est  alors  difficile  sans  passer  ^  I'ordre  1 
de  conserver  des  CFL  corrects.  Pour  pallier  ce 
problfeme,  nous  avons  mis  au  point  une  procedure 
"monotonisante"  qui  consiste  a  prendre  la  valeur  la 
plus  proche  de  aj  parmi  6c..  et  dj.  {done  la  valeur  la 

plus  proche  du  premier  ordre)  et  nous  proeddons  de 
meme  du  cot6  j.  Nous  voyons  bien  que  de  cette  fafon, 
la  reconstruction  devient  monotone  et  de  plus,  sur  un 
maillage  rdgulier,  cette  correction  n'est  que  du 
troisifeme  ordre  pour  les  flux  done  le  r&ultat  global 
reste  du  second  ordre  lorsque  le  limiteur  global  n'est 
pas  effectif. 

2.2-  Discretisation  temporelle 

Schema  d*int^gration  en  pas  de  temps  global 
Nous  avons  pr6c6demment  soulign6  le  fait  que  le 
schdma  du  second  ordre  non  limite  est  instable 
lorsque  Ton  intfegre  en  temps  par  une  m6thode  de 
Euler  explicite.  Pour  rem^dier  a  cet  inconvenient  et 
afm  d'augmenter  la  stability  du  schema  limite  (done 
travailler  avec  un  CFL  superieur  a  1/2),  nous  avons 
impiemente  un  schema  d’integration  explicite  du 
second  ordre. 

En  effet,  I'etude  de  la  stabilite  lindaire  par  I'analyse 
de  Fourier  montre  que  les  schemas  classiques 
d'integration  du  second  ordre  sont  stables  pour  un 
CFL  egal  ^  1,  lorsque  i'on  utilise  une  discretisation 
spatiale  decentree  du  second  ordre  avec  pente 
centree. 

Pour  obtenir  un  schema  relativement  bon  marche,  il 
faut  tenir  compte  du  cout  informatique  des  differents 
algorithmes  intervenant  lors  de  la  discretisation 
spatiale.  Ainsi,  approximativement,  la  resolution  du 
probleme  de  Riemann  represente  15%  du  cout  total,  le 
calcul  des  gradients  30%,  la  limitation  globale  environ 
30%  et  la  limitation  locale  environ  5%,  le  reste  etant 
difficilement  reperloriable 

Nous  voyons  done  qu'il  faut  eviter  si  possible  de 
recalculer  les  gradients  et  de  les  relimiter 
globalement.  Par  contre,  il  est  acceptable  de  refaire 
une  limitation  locale  et  de  recalculer  les  flux. 

Il  faut,  d'autre  part,  prendre  en  consideration  le  fait 
que  les  equations  a  rdsoudre  ne  sont  pas  lineaires:  tons 
les  schemas  Iin6aris6s  sont  Equivalents  pour  I’analyse 
de  Fourier.  Il  faut  mettre  en  oeuvre  le  schema  dont  le 
comportement  non  lineaire  est  le  meilleur. 

Nous  avons  done  choisi  le  schema  de  Heun  mais 
avec  un  seul  calcul  de  gradient.  Concernant  la 
limitation  globale,  une  procedure  simplifiEe  permet  un 


regroupement  avec  la  limitation  locale  sans  perte  de 
rendement. 

Nous  rappelons  que  le  schEma  de  Heun  consiste  ^ 
calculer  les  flux  (et  les  gradients)  ^  I'instant  n,  puis 
grace  a  cette  approximation,  on  calcule  I'Etat  k 
I'instant  n+1  done  les  flux  h  I'instant  n+1.  Alors,  la 
variation  entre  I'instant  n  et  n+1  correspond  &  la  demie 
somme  des  flux  prEcEdemment  calculEs  en  n  et  n+1. 
Ce  schEma  est  trEs  stable  pour  les  phEnomEnes  non 
linEaires. 

Schema  d’integration  temnorel  adantatif 
Le  schEma  de  Heun  possEde  de  nombreuses  qualitEs 
numEriques,  malheureusement,  comme  tout  schEma 
explicite,  il  est  pratiquement  inutilisable  lorsque  la 
durEe  k  simuler  est  importante. 

Si  Ton  analyse  les  phEnomEnes  intervenant  lors  d'une 
sEparation  d'Etages,  on  remarque  immEdiatement 
qu'ils  sont  quasi-stationnaires  sur  presque  tout  le 
champ  de  calcul.  Seules,  quelques  rEgions  sont 
balayEes  par  des  courants  fondamentalement 
instationnaires.  Done,  il  est  intEressant,  dans  ces 
zones,  d'utiliser  de  petits  pas  de  temps,  par  contre, 
ailleurs,  de  grands  pas  de  temps  sont  suffisants. 

Nous  avons  done  mis  au  point  une  technique  de  pas 
de  temps  local  qui  est  conservative,  consistante  et 
stable:  I'intEgration  temporelle  adaptative  (REf.  5  et 
8). 

Dans  chaque  maille,  on  travaille  en  utilisant  le  pas  de 
temps  le  plus  proche  possible  du  pas  de  temps 
explicite  maximum  admissible. 

Soit  Atmin  le  plus  petit  pas  de  temps  sur  tout  le 
domaine,  pour  simplifier  la  gestion  des  diffErentes 
classes  temporelles,  on  ordonne  les  pas  de  temps  en 
puissance  de  2,  proportionnellement  ^  Atmin.. 

C’est  ^  dire  que  si  le  pas  de  temps  admissible  dans  la 
maille  i  vaut  Dtj  alors  on  le  transformera  en: 

(10)  Ati  =  Atniin.2Li 

ou  Li  reprEsente  le  niveau  temporel  de  la  cellule  i  tel 
que: 

(11)  Atmin. 2^1  <  Dq  <  Atmin  •2’-*+^ 


Entre  deux  mailles,  on  posera  comme  principe  que 
I'interface  est  du  niveau  temporaire  le  plus  fort. 

Pour  que  la  mEthode  soit  conservative,  il  faut  que  les 
intEgrales  de  flux  de  part  et  d'autre  de  I'interface  Sjj 
soient  identiques.  Done,  il  suffit  de  dEfinir  en  tout 
instant  le  flux  de  fafon  univoque  sur  Sij.  Ensuite,  si 
par  exemple  Li=Lj+ 1  alors,  dans  la  maille  j  on  fera 
deux  itErations  pour  une  seule  dans  i.  Ainsi  dans  j 
I’intEgrale  de  flux  vaudra: 


t+Atj 


i+2.Aij 


i+2.Atj 


(12) 


fF.s..at+  fF.s..at=  fF.s..at 

J  ji  J  11  J  11 


ji 


et  dans  i: 
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Ce  qui  prouve  la  conservativit6  du  systfeme. 

Soil  ^  rfisoudre: 

a  do. 

qui,  sur  les  solutions  lisses,  &juivaut  k. 

(15)  ^  +  div(F)  =  0 
ot 

Supposons  maintenant  une  approximation  du  flux 
telle  que: 

(16)  F(g,t)  =  F(g,t)  +  e(At™) -H  e(h" )  +  e(AtP.h) 

\ 

ou  g  reprdsente  le  cwitie  de  gravity  des  faces  Sjj . 
Alors,  si  I'on  int^gre  ce  flux  sur  la  bordure  de 
l'616ment  i  en  considdrant  que  les  troncatures  spatio- 
temporelles  sont  ind^pendantes  pour  toutes  les  faces, 
on  peut  6tudier  la  precision  de  I'approximation  k 
I'instant  Tm  (milieu  des  deux  homes  temporelles 
d'int6gration)  et  en  G  (centre  de  gravity  de  fl): 


(17) 

+ div(Fj")  =  e(h""^ )  + 

9t 

-t-0(h^)+0(At^) 


Afin  _ 

0(^)+e(AtP) 

n 


Pour  que  le  schema  reste  consistant,  il  faut  que  m  soit 
supdrieur  k  1  done  le  schdma  doit  etre  au  minimum 
d'ordre  2  en  temns  sur  les  flux  (et  s'il  n'est  que  d'ordre 
2,  une  <x)ndition  du  type  At/h  bomd  est  ndcessaire ) 

Le  terme  en  AtP  correspond  k  I'approximation 
temporelle  des  gradients:  dans  le  schdma  de  Heun  que 
nous  utilisons,  nous  calculons  les  gradients  une  seule 
fois,  au  ddbut  de  chaque  itdration,  done  p  dgale  1  et  le 
schdma  est  globalement  du  premier  ordre  en  temps 
(rdsultat  ddjii  acquis  puisque  m  dgale  2). 

Quant  k  la  condition  At/h  bornd,  elle  est 
automatiquement  remplie  par  celle  de  CFL. 


L’etude  de  stabilitd  des  schemas  temporels  adaptatifs 
est  relativement  difficile  puisque  I’analyse  de  Fourier 
n'est  plus  utilisable.  On  peut  par  contre  dtudier  la 
diffusivitd  du  schdma  pour  I'dquation  la  plus  simple 


sur  un  maillage  rdgulier: 

Ba  da  „ 


Si  le  schdma  est  diffusif,  il  possdde  des  chances  d'etre 
stable,  sinon,  il  est  instable. 


Pour  un  schema  de  Heun  du  premier  ordre  en  espace, 
la  condition  de  positivite  de  la  diffusion  numerique 
est  simple  puisqu'il  suffit  que  le  plus  grand  pas  de 
temps  verifie  la  condition  de  CFL.  Malheureusement, 
pour  un  schema  d'ordre  2  en  espace,  la  condition 


depend  de  la  methode  mise  en  oeuvre  pour  gdrer 
I'cnscmble  des  maillcs. 

Une  approche  plus  facile  consiste  h  utiliser  les 
contiaintes  permettant  au  schema  d'etre  h  variation 
bomee. 

Commengons  par  limiter  le  pas  de  temps  sur  le 
voisinage  tel  que 

(19)  Li  =  min(Lj) 

J 

puis,  reduisons  le  saut  de  pas  de  temps  entre  mailles 
tel  que: 

(20)  lLi-Lj|<l,V(i,j) 

Cette  demibre  operation  necessite  Lmax'i  iterations 
Lmax  1®  niveau  temporel  maximum. 

Nous  obtenons  ainsi  le  type  de  configuration  suivante 


n"  de  maille 


en  monodimensionnel. 

Ainsi  la  maille  i44  fera  une  iteration  pour  seize 
iterations  de  la  maille  i. 

Un  raisonnement  simple  nous  montre  que  le  scMma 
reste  localement  k  variation  bomee  lorsQuc  Ton  limite 
les  ^dients  au  debut  de  Fiteration  puis  k  la  fin  .en 

riteration.  De  plus,  si  la  condition  Li  =  min(Lj) 
n'est  plus  respectee.  le  schema  n'est  plus  h  variation 
bomee. 

En  pratique,  nous  avons  remarque  que  la  methode  est 
stable  pour  un  CFL  egal  a  1  alors  que  le  limiteur  ne 
garantit  la  stabilite  que  pour  un  CFL  de  1/2. 

Reduire  le  saut  de  niveau  temporaire  est  a  priori 
inutile  du  point  de  vue  de  la  stabilite  mais  cette 
procedure  simplifie  enormement  la  gestion  des 
mailles  (en  particulier  lors  du  calcul  des  gradients). 

Concemant  I'efficacite  de  la  methode  (qui  est  definie 
comme  etant  le  rapport  cout  du  calcul  en  pas  de 
temps  global  sur  cout  de  calcul  en  temporel  adaptatiO, 
on  peut  revaluer  simplement  lorsque  Ton  connait  la 
fonction  de  repartition  des  pas  de  temps.  Dans  la 
pratique,  cette  fonction  depend  du  maillage  et  des 
phenomenes  locaux.  Elle  est  done  variable.  Le  tableau 
ci  dessous  determine  les  limites  de  cette  efficacite 
ainsi  qu'une  valeur  moyenne  et  une  valeur 
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exp6rimentale  (la  valeur  moyenne  correspond  ^  une 
repartition  &iuiprobable  des  pas  de  temps) 


A^ax 

Atmin 

2 

4 

8 

16 

32 

64 

Efmax 

1,6 

3,2 

6.4 

12,8 

25,6 

51,2 

Efmoy 

1,25 

2,05 

3,5 

6,05 

10,65 

19,1 

Efexp 

11,2 

20 

Notons  que  ces  valeurs  sont  intimement  liSes  a 
I'algorithmique  utilis6,  et  que  les  formulations  non 
structuiees  sont  bien  adaptdes  k  ce  type  de  technique. 

3- CHEVAUCHEMENT  DE  MAILLAGES 

L'id6e  de  base  est  trfes  simple  (R6f.  6  et  7): 
consid^rons  qu'un  maillage  Ml  se  comporte  comme 
un  masque  qui  se  ddplace  et  recouvre  partiellement  un 
autre  maillage  M2.  L'interface  entre  maillages  est 
cr66e  naturellement:  il  s'agit  de  la  surface 
d'intersection  form6e  par  levidement  dans  le  second 
maillage  M2  du  volume  occup6  par  le  masque  Ml. 
Les  mailles  de  Ml  ne  subissent  done  aucune 
modification.  Par  contre,  dans  M2,  il  y  a  presence  de 
tfois  types  de  mailles  (cf.  figure  1): 

1-  les  mailles  totalement  couvertes, 

2-  les  mailles  partiellement  couvertes, 

3-  les  mailles  totalement  d^couvertes. 

Les  mailles  de  la  seconde  cat6gorie  sont  done 
modifi^es  car  une  partie  des  faces  qui  les  constituent 
est  couverte  et  de  nouvelles  faces  correspondant  it  la 
limite  exteme  du  masque  sont  cr66es.  Ces  nouvelles 
faces  forment  l'interface  entre  mailles  de  M2 
(couples)  et  mailles  de  Ml  (non  modifies). 

3.1-  Calcul  des  intersections 

Pour  simplifier  le  probldme  gdomdtrique  nous  avons 
considfirl  que  toutes  les  faces  des  616ments  sont 
planes.  Les  faces  quadrangulaires  sont  done  traitdes 
comme  deux  faces  triangulaires.  Nous  n'avons  done 
plus  qu'un  seul  type  de  facette:  le  triangle. 
L'algorithme  de  calcul  d'intersection  se  limite  it  deux 
etapes: 

1-  d6terminer  le  niveau  de  couverture  de  chaque  face 
du  maillage  par  le  masque, 

2-  determiner  la  partie  de  la  limite  exteme  du  masque 
qui  ferme  chacune  des  mailles  de  M2  couples. 

Les  deux  6tapes  sont  en  fait  identiques  d'un  point  de 
vue  algorithmique  lorsqu'on  les  reformule  comme 
suit; 


1-  Pour  chaque  face  de  M2,  determiner  la  partie 
comprise  dans  chacune  des  mailles  de  Ml.  La  somme 
de  ces  parties  determine  la  couverture  des  faces. 

2-  Pour  chaque  face  formant  la  frontifere  de  Ml, 
determiner  la  partie  comprise  dans  chacune  des 
mailles  coupees  (une  maille  etant  consideree  comme 
coupee  si  I'une  de  ses  faces  est  partiellement  ou 
totalement  couverte  £l  si  au  moins  une  de  ses  faces 
n'est  pas  totalement  couverte). 

Nous  voyons  done  qu'il  s'agit  bien  du  meme 
algorithme  de  base:  determiner  la  surface  d'un  uiangle 
contenue  dans  un  polybdre. 

Pour  rfisoudre  ce  problfeme,  le  plus  simple  est  de 
travailler  dans  le  plan  de  la  face  triangulaire.  On 
determine  alors  la  trace  polygonale  du  polyedre  dans 
ce  plan  (chose  simple  puisque  le  polype  est  forme 
de  triangles)  puis  la  partie  commune  au  triangle  et 
ce  polygone.  Cette  demifere  operation  necessite  juste 
la  connaissance  des  segments  orientes  qui  constituent 
la  trace.  11  faut  eviter  tout  algorithme  qui  determine  le 
chatnage  des  segments  entre  eux:  e'est  inutile  et 
excessivement  couteux. 

Calculer  la  surface  couverte  n'est  pas  suffisant,  il  faut 
aussi  determiner  son  centre  de  gravite.  On  determine 
alors  le  centre  de  gravite  des  interfaces  coupees  de  M2 
et  celui  des  morceaux  de  la  bordure  de  Ml  qui 
ferment  les  mailles  coupees. 

Les  volumes  et  les  centres  de  gravite  des  mailles 
coupees  de  Ml  sont  alors  calcuies  en  utilisant  les 
fOTmules  suivantes  (Green): 

(21)  CO.  =-yoM...s.. 

1  3y  ij  y 

(22)  OG  =  — TOg..(OM...S..) 

4co.  Y  y  y  y 

1  J 
ou: 

-  Mij  est  un  point  quelconque  de  la  face  plane  Sij 
orientee  vers  I'exterieur  de  i, 

-  j  est  soit  un  voisin  "naturel"  (done  une  autre  maille 
de  M2),  soit  recouvrant  (done  une  maille  de  Ml  qui 
est  voisine  par  l'interface  M1-M2), 

-  G  etant  le  centre  de  gravite  de  i  et  gij,  le  cenrte  de 
gravite  de  Sjj. 

3J-  Optimisation  du  nombre  d'op4rations 

Pour  que  la  methode  soit  utilisable,  il  faut  que  le 
temps  de  calcul  des  intersections  soit  au  plus  du 
meme  ordre  de  grandeur  que  le  t^mps  de  calcul  d'une 
iteration  de  solveur  aerodynamique. 

Soit  N  le  nombre  de  mailles,  alors,  le  nombre  de 
facettes  h  la  bordure  de  Ml  est  de  I'ordre  de  N^/S  et  le 
nombre  de  cellules  coupees  est  du  mdme  ordre. 

Pour  determiner  la  surface  de  la  bordure  qui  ferme 
chaque  cellule  coupee,  il  faudra,  par  cellule,  environ 
n2/3  operations. 

Pour  toutes  les  cellules,  il  faudra  done  de  I'ordre  de 
N*^/^  operations.  Le  solveur  aerodynamique  necessite 
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de  I'ordre  de  N  operations  par  iteration.  II  faut  done 
optimiser  le  calcul  des  intersections. 

La  solution  que  nous  avons  retenue  consiste  a 
determiner,  sur  une  grille  cartesienne  (ij,k)  contenant 
N  mailles,  I'appartenance  des  differentes  facettes 
formant  la  bordure  de  Ml.  Ensuite,  pour  chaque 
cellule  coupee,  on  determine  sa  position  dans  la  grille 
et  done  quelles  facettes  peuvent  la  fermer.  Le 
preconditionnement  des  facettes  necessite  environ  N 
operation,  et  le  calcul  par  cellule  coupee  est  de  I'ordre 
de  une  operation  soit,  pour  toutes  les  cellules  environ 
n2/3  operations. 

Le  cout  global  est  de  N  operations  (le 
preconditionnement  est  plus  cher  que  le  calcul 
d'intersection  !)  done  compatible  avec 
I'aerodynamique. 

3.3-  Priority  de  maillages 

L'utilisation  d'un  masque  et  d'un  maillage  masque 
manque  de  souplesse.  En  effet,  les  mailles  du  maillage 
masque  sont  par  exemple  mieux  adaptees  au  calcul 
d'une  couche  limite  autour  du  corps  lie  ^  ce  maillage 
que  les  maUles  du  masque. 

II  est  done  interessant  de  definir  des  zones  prioritaires 
que  le  masque  ne  peut  couvrir  mais  qui  au  contraire, 
couvrent  le  masque. 

La  figure  2  nous  montre  le  resultat  d'une  telle 
strategie.  Sa  mise  en  oeuvre  ne  pose  pas  de  probleme 
particulier. 

3.4-  Calcul  des  flux  d  la  frontiere  et  assemblage 


recalcule  toutes  les  intersections  (ainsi,  le  bilan 
volumique  est  exactement  v6rifi6  sans  utiliser  A(Oi). 
Cette  technique  permet  de  diviser  le  cout  des  calculs 
d'intersections  par  un  facteur  trfes  important  (de  I'ordre 
de  cent). 

Finalement,  il  reste  traiter  le  probl6me  des  mailles 
fortement  couvertes. 

En  effet,  lorsque  le  recouvrement  d'une  maille  par  le 
"masque"  conduit  h  des  volumes  trds  faibles,  la 
condition  de  CFL  (9d)  devient  trop  p6nalisante 
puisque  le  pas  de  temps  doit  tendre  vers  z6ro.  La 
solution  consiste  a  assembler  ces  mailles  avec  des 
mailles  voisines  suffisamment  ddcouvertes.  De  cette 
fa?on,  I'ensemble  form6  d'une  maille  suffisamment 
d^couverte  et  de  ses  associ^s  constitue  une  "macro- 
maille"  dont  le  volume  est  assez  grand  pour  ne  plus 
p6naliser  le  pas  de  temps. 

Nous  utilisons  comme  crit5re  le  taux  de  couverture: 
une  maille  doit  etre  assembl^e  lorsque  sont  volume  est 
couvert  h  plus  de  70%.  D'autre  part  elle  sera 
assembl6e  avec  le  voisin  qui  possMe  avec  elle  en 
commun  I'interface  la  plus  gran^  et  qui  est  d&ouvert 
plus  de  30%  en  volume. 

Lorsque  I'assemblage  n'est  pas  possible  directement, 
une  proc6dure  itdrative  est  mise  en  oeuvre:  on 
assemblera  alors  par  I’interm^diaire  d’une  cellule  qui 
elle  meme  est  assemble  (...). 

Lorsque  I'assemblage  n'est  pas  possible  du  tout,  on 
evince  du  calcul  les  mailles  incrimindes. 

APPLICATIONS 


Les  flux  cl  la  frontiere  sont  calcules  de  la  meme  fa9on 
que  les  flux  entre  deux  mailles  appartenant  au  meme 
maillage:  puisque  nous  travaillons  en  non  structure,  la 
topologie  importe  peu  done  une  interface  entre 
mailles  sera  traitde  toujours  de  la  meme  fagon,  que  ces 
mailles  appartiennent  ou  non  au  meme  maillage. 
Quant  a  la  premiere  equation  du  systeme  (1) 
concernant  la  variation  de  volume,  elle  permet, 
lorsque  les  mouvements  sont  lents,  d'eviter  de 
recalculer  les  intersections  apres  chaque  iteration 
aerodynamique;' 

-  pour  chaque  maille  coupee  i,  on  evalue  I'increment 
de  volume  A  Ci>i  du  au  mouvement  relatif  des 


maillages, 

-  on  determine  I'increment  relatif  maximum  sur  toutes 
ces  mailles 


(23)  Mmax  =  m.ax(  Aco 


ou  (Qi  designe  le  volume  initial, 

-  lorsque  AIj^^  est  faible,  on  fait  le  bilan 

volumique  pour  tenir  compte  des  deplacements  sans 
remettre  a  jour  les  caracteristiques  des  interfaces, 

-  lorsque  AIj^^x  grand  (ou  bien  lorsque  la 

somme  des  calcules  depuis  la  derniere 

remise  a  jour  des  intersections  est  grande),  on 


Nous  prdsenterons  ici  des  cas  de  calcul  illustrant  les 
possibilitds  de  la  m6thode. 

Les  figures  3  et  4  montrent  le  type  d'applications 
trait^es  grace  au  code  de  calcul  FLUSEPA.  II  s'agit  de 
simulations  tridimensionnelles.  Le  Mach  exteme  est 
compris  entre  5  et  6. 

Pour  la  separation  d'etage  de  missile  sous  incidence 
(figure  n°  3),  la  periode  simul6e  est  d'environ  150  ms 
et  la  duree  du  calcul  est  de  12  heures  en  pas  de  temps 
global.  Notons  que  ce  type  de  simulation  necessite 
aussi  bien  le  calcul  de  I'ecoulement  exteme  que  de 
r6coulement  inter  6tage  puisque  ils  interagissent  trfes 
fortement  entre  eux.  D'autre  part,  dans  I'inter  6tage, 
les  pressions  peuvent  devenir  importantes  (lorsque  la 
section  de  passage  vers  I'exterieur  est  faible).  De  ce 
fait,  I'ecoulement  dans  la  tuyere  peut  ebe  fortement 
da;oll6:  il  faut  done  imp^rativement  le  calculer  aussi. 
Pour  le  largage  d'accelerateurs  (figure  n°  4),  la 
periode  simulee  est  d'environ  1,1  seconde.  Les 
maillages  component  environ  100  000  mailles  et  la 
duree  du  calcul  est  d'environ  40  heures  en  temporel 
adaptatif  (sur  Cray  YMP).  Le  gain  de  temps  par 
rapport  a  un  schema  a  pas  de  temps  global  est 
d'environ  un  facteur  20.  Afin  de  souligner  la 
robustesse  de  la  methode,  nous  precisons  que  les 
phenomencs  rencontres  lors  de  cette  simulation  sont 
fortement  instationnaires  (acouslique...)  et  que  dans 
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les  zones  de  chocs  forts,  les  rapports  de  pression  sont 
de  I'ordre  de  4  000. 

Notons  pour  finir  que  des  Etudes  de  validation  (aussi 
bien  bidimensionnelles  que  tridimensionnelles) 
comprenant  des  comparaisons  avec  des  mesures 
exp6rimentales  ont  6t6  men6es  avec  succds. 

Afin  de  r6duire  notre  expose,  nous  ne  les  presenterons 
pas  ici. 

rONCI.USIONS 

Les  demarches  th6oriques  que  nous  avons  men6es 
nous  prouvent  aussi  bien  la  consistance  que  la  stability 
linfiaire  de  la  m6thode  sur  des  maillages 
multidimensionnels  non  r6guliers  en  espace  et  en 
temps.  L'exp^rimentation  num^rique  nous  a  d6montr6 
le  bon  comportement  des  schemas  lors  de  la 
resolution  de  syst^mes  non  lineaires. 

Quant  ^  la  precision  de  la  methode,  elle  est  d'ordre  2 
en  espace  et  en  temps  sur  les  maillages  d'hexaMres 
struetures  reguliers  et  d’ordre  1  au  moins  ailleurs. 
Nous  noterons  finalement  que  le  potenUel  d'evolution 
du  code  est  important  puisque,  par  exemple,  il  est 
envisageable  d'adapter  le  maillage  par  deformation 
(etant  donnee  notre  formulation  A-L-E.),  par 
enrichissement  (nous  travaillons  en  non  structure)  ou 
grace  au  chevauchement  d'un  maillage  localement 
adapte  h  I'ecoulement... 


"Temporal  Adaptive  Euler/Navier-Stokes  Algorithm 
Involving  Unstructured  Dynamic  Meshes" 

AlAA  Journal,  Vol  30,  n°8,  1992 

(6)  S.L.  HANDCOCK 

"Finite  difference  equations  for  PISCES  2  DELK,  a 
coupled  Euler-Lagrange  continuum  mechanics 
computer  program" 

Physics  Intematonal  Technical  Memo  -  TCAM  76-2, 
1976 

(7)  P.  BRENNER 

"Three-Dimensional  Aerodynamics  with  Moving 
Bodies  Applied  to  Solid  Propellant" 

AIAA  paper  9 1-2304, 1991. 

(8)  P.  BRENNER 

"Numerical  Simulation  of  Three-Dimensional  and 
Unsteady  Aerodynamics  About  Bodies  in  Relative 
Motion  Applied  to  TSTO  Separation" 

AIAA  paper  93-5142, 1993. 


REFERENCES 

(1)  S.K.  GODOUNOV,A.  ZABRODINE, 

M.  IVANOV,  A.KRAi'KO,  G.  PROKOPOV 
"Resolution  numerique  des  problemes 
multidimensionnels  de  la  dynamique  des  gaz" 

Editions  MIR  -  Moscou 

(2)  M.  POLLET 

"Comparison  of  transport  schemes  for  Navier-Stokes 
equations,  application  to  rocket  propultion" 

7*^  International  Conference  on  Numerical  Methods 
in  Laminar  and  Turbulent  flow,  Stanford  USA,1991. 

(3)  B.  VAN  LEER 

"Towards  the  Ultimate  conservative  Difference 
SchemeV.  A  Second  Ordre  Sequel  of  Godunov's 
Method". 

J.  Comput.  Phys.  32, 1979. 

(4)  F.  GODFROY,  P.  JACQUEMIN  and  F.  JOUVE 
"Three  dimensional  simulation  of  unsteady  and 
inviscid  flows  using  a  second  order  finite  volume 
method.  Application  to  flows  inside  solid  propellant 
motors" 

Computing  methods  in  applied  sciences  and 
engineering  EDITION  Glowinski,  INRIA 

(5)  W.L.  KLEB  and  j.T.  BATINA* 


32-10 


figure  1;  chevauchement  de  maillage 


figure  3:  separation  sous  incidence  par  allumage  direct 
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1.  SUMMARY 

Recent  progress  in  flux  vector  splitting  is  reviewed  with 
the  aim  to  obtain  high  resolution  and  robustness  for  hy¬ 
personic  reacting  flow  simulations.  The  numerical  be¬ 
havior  of  promising  AUSM  und  CUSP  discretization 
variants  is  reported  and  compared.  These  schemes  can 
be  combined  with  explicit  multistage  time  stepping  and 
multigrid.  Large  chemical  source  terms  introduce  stiff¬ 
ness  into  the  system  of  equations  which  is  removed  by 
point  implicit  treatment.  The  present  results  demonstrate 
that  efficient  3D  simulations  of  viscous  reacting  flows 
with  large  contrast  generated  by  strong  shocks  are  now 
feasible. 

2.  INTRODUCTION 

Accurate  computations  of  3D  complex  flow  fields  will 
play  a  key  role  in  the  aerothermal  design  of  high  speed 
vehicles  such  as  reentry  configurations.  Not  only  can 
flow  simulations  shorten  design  cycles  and  save  cost  but 
they  reduce  uncertainty  margins  in  heat  loads  and  aero¬ 
dynamic  forces.  A  prominent  example  is  the  US-Orbiter 
vehicle,  whose  thermal  protection  system  is  heavy  due 
to  heat  transfer  uncertainties.  Moreover,  the  space  shut¬ 
tle  experienced  an  unexpected  hypersonic  pitch  up 
which  had  not  been  predicted  by  conventional  cold  hy¬ 
personic  wind  tunnel  testing. 

The  extensive  use  of  3D  flow  simulations  for  complex 
configurations  within  the  aerodynamic  design  cycles  has 
been  precluded  until  recently  by  several  reasons. 
Among  these  are  long  computation  times  of  the  codes 
simulating  viscous  flow  or  nonequilibrium  chemistry. 
Also,  many  codes  are  not  sufficiently  robust  in  flow  re¬ 
gions  with  strong  shocks  and  flow  expansions.  Other 
codes  are  robust  but  they  fail  to  resolve  contact  disconti¬ 
nuities  such  as  boundary  layers. 

As  a  result  of  various  attempts  to  solve  complex  hyper¬ 
sonic  reacting  flows  numerically  we  can  formulate  some 
key  requirements  for  the  underlying  solution  algorithm. 
These  are 

-  Capturing  of  strong  shocks  without  oscillations  of 
the  dependent  flow  variables 


-  Robustness  in  regions  of  strong  flow  expansion 

-  Capturing  of  grid-aligned  slip  lines  without  numeri¬ 
cal  smearing 

-  Provision  of  an  adaptive  dissipative  term  in  order  to 
achieve  sufficient  numerical  damping  under  adverse 
grid  or  flow  conditions 

The  first  requirement  addresses  the  ability  of  the  scheme 
to  resolve  complex  3D  shock  interactions  with  a  limited 
amount  of  grid  points.  Moreover,  oscillations  at  shocks 
may  prevent  convergence  of  the  overall  methods  to  the 
desired  steady  state  solutions.  The  second  point  relates 
to  the  failure  of  various  prominent  discretization 
schemes  when  applied  to  rapid  flow  expansions  into 
near  vacuum  conditions.  Additionally,  the  scheme 
should  resolve  viscous  shear  layers  with  minimum  nu¬ 
merical  smearing  in  order  to  keep  the  number  of  grid 
points  reasonably  small.  The  fourth  requirement  results 
from  the  experience  that  one  can  usually  not  avoid  ad¬ 
verse  grid  situations  in  3D,  particularly  high  values  of 
cell  aspect  ratio.  However,  the  available  convergence 
acceleration  techniques  such  as  residual  smoothing  and 
multigrid  rely  on  the  damping  of  transient  high  fre¬ 
quency  modes  in  the  solution  for  which  controlled  artifi¬ 
cial  dissipation  is  necessary. 


3.  SHOCK  CAPTURING,  HIGH-ORDER 
SCHEME  AND  ADAPTIVE  DISSIPATION 

Progress  in  flux  vector  splitting  has  recently  demon¬ 
strated  that  the  aforementioned  requirements  can  be  ful¬ 
filled  without  characteristic  decomposition  of  the  invis- 
cid  flux  and  the  corresponding  matrix  operations.  The 
present  paper  covers  two  promising  approaches  into  this 
direction  which  were  initiated  by  Liou  [1,2]  and  Jame¬ 
son  [3,4],  Other  related  pieces  of  work  on  the  subject  are 
found  in  Refs.  [5-7].  The  principal  idea  of  flux  vector 
splitting  is  shown  by  application  to  the  ID  Euler  equa¬ 
tion 


3t  dx 


=  0  ,  W  = 


pu 

pu^  +  P 
puH 


(1) 


Paper  presented  at  the  AGARD  FDP  Symposium  on  "Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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Assume  that  the  computational  domain  is  discretized  in 
intervals  with  the  centers  denoted  by  i-1,  i,  i+1, ...  and 
the  cell  faces  by  i-1/2,  i+1/2,  ....  Then,  the  numerical 
flux  at  the  interface  i+1/2  can  be  approximated  accord¬ 
ing  to  Liou  [1,2] 


where  L  and  R  denote  the  right  and  left  states  at  the  cell 
face,  Ml/r  is  the  upwind  weighted  Mach  number  at  the 
cell  face  and  pP,  p"*  are  Mach  number  weighted  contri¬ 
butions  of  the  left  and  the  right  pressure  values.  Upwind 
weighting  is  accomplished  by  proper  polynomials  of  the 
local  Mach  number  [2],  by  which  the  scheme  is  made 
purely  upwind  for  supersonic  flow  whereas  central  dif¬ 
ferencing  is  obtained  for  M  0.  This  scheme  is  called 
AUSM.‘ 

The  alternate  approach  followed  by  Jameson  [3]  is  to 
take  a  central  average  and  subtract  a  diffusive  term 


with  W  =  (p,  pu,  pH)^.  Now,  the  diffusion  coeffi¬ 
cients  a  and  P  must  be  chosen  such  that  upwinding  is 
obtained  in  the  supersonic  range  and  d  0  for  M  ->  0. 
This  scheme  is  called  HCUSP. 

Both  ^proaches  use  scalar  dissipation  functions  so  that 
the  computational  expense  of  the  overall  method  is  pro¬ 
portional  to  N  where  N  is  the  number  of  flow  equations 
to  be  computed.  Note,  that  there  is  particular  motivation 
to  use  the  equation  (3)  rather  than  (2)  if  the  discretiza¬ 
tion  is  combined  with  explicit  multistage  time  stepping. 
Then,  very  effective  hybrid  multistage  schemes  are  at 
hand  [8]  for  which  the  dissipation  terms  are  only  evalu¬ 
ated  at  m  out  of  totally  n  stages. 

The  conceptual  differences  between  both  flux  vector 
split  schemes  show  up  for  the  problem  of  resolving  a 
stationary  shock  wave.  Fig.  1  sketches  the  situation  en¬ 
countered  in  the  analysis  of  AUSM.  It  is  assumed  that 
the  states  (L)  and  (R)  fulfill  the  jump  conditions. 

^The  extension  to  multidimensions  on  structured  grids 
is  standard  and  may  be  found  in  Ref.  [5] 


Equilibrium  of  the  state  (L)  is  obtained  if  the  flux  in  be¬ 
tween  (L)  and  (R)  is  obtained  by  full  upwinding.  Also, 
the  state  (R)  is  in  equilibrium  if  an  upwind  flux  is  used 
for  the  interface.  This  requirements  can  be  fulfilled  by 
defining  the  speed  of  sound  at  the  shock 

c*L 

CL  =  CR  =  URor  Cl  =  Cr=-—  (4) 

“l 

for  upwinding  the  flux  where  c*  is  the  critical  speed  of 
sound.  Hence,  the  state  (R)  is  made  supersonic  and  it  is 
fully  cancelled  by  the  flux  formulation  (2).  Generaliza¬ 
tion  of  equ.  (4)  for  arbitrary  flow  direction  and  speed  is 
given  in  Ref.  [2].  This  scheme  is  called  AUSM^. 
Another  way  to  shock  resolution  with  AUSM  was  ob¬ 
tained  by  the  observation  that  the  highly  dissipative  flux 
of  Van  Leer  [9]  differs  from  AUSM  by  a  dissipative 
term. 


for  0  <  M  <  1 


(5) 

in  subsonic  flow  and  it  is  identical  in  the  supersonic  re¬ 
gion  [5].  Smoothly  captured  shocks  may  then  be  ob¬ 
tained  by  defining 

^Hybrid  =  (1  -  ®)  Fvan  Leer  +  ^^AUSM 
and  (B  depends  on  the  second  difference  of  the  pressure 
in  order  to  detect  shocks.  This  scheme  is  called  hybrid 
AUSM. 

With  HCUSP,  on  the  other  hand,  the  shock  structure  is 
analyzed  according  to  Refs.  [3,4]  for  a  shock  with  a  sin¬ 
gle  interior  zone  shown  in  Fig.  2.  Again,  the  states  (L) 
and  (R)  satisfy  the  jump  conditions  and  (L)  is  super¬ 
sonic.  Equilibrium  of  the  shock  is  obtained  if  fluxes 

^L/A  “  ^A/R  ” 

Then  the  flux  balance  for  points  L,  A,  R  is  zero.  The 
condition  at  the  entrance  to  the  shock  is  fulfilled  if  the 
flow  is  supersonic  at  (L/A).  The  condition  at  the  shock 
exit  leads  to  a  Hugoniot  equation  for  a  moving  shock 

fR-fA+i^(WR-W,)  =0 

This  equation  can  be  solved  by  Roe  linearization  [10] 
and  yields  the  relation 

ac  =  (l  +  P)(c-u)  forO<M<1.  (6) 
between  the  dissipation  coefficients. 

Jameson  [3]  has  used  equ.  (6)  in  or'’'**-  to  derive  a  suit¬ 
able  form  of  dissipating  coefficients  a  and  p,  i.e. 
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ac  =  |M|  -  po 


P  =  max 


0, 


u  +  X, 
u-X,~  > 


0<M<1 


by  which  central  differencing  is  again  approached  for 
M->0.  We  note  that  equ.  (6)  is  not  respected  for  M<0.5 
so  that  one  can  expect  problems  with  shock  capturing  if 
the  Mach  number  at  interface  (A/R)  is  less  than  0.5. 


The  spatial  accuracy  of  the  flux  vector  split  schemes  de¬ 
pends  on  the  determination  of  the  left  and  right  states  at 
the  cell  interfaces.  For  a  first-order  scheme  the  flow 
quantities  at  the  left  and  right  states  are  given  by  their 
values  at  the  neighboring  mesh  points,  i.e.  i,  and  i+i,  re¬ 
spectively.  In  the  present  work  higher  order  accuracy  is 
obtained  with  the  MUSCL  approach  by  which  the  flow 
quantities  are  extrapolated  to  yield  left  and  right  states. 
The  extr^lation  function  is  designed  such  that  the  ac¬ 
curacy  is  limited  to  first  order  at  discontinuities  in  order 
to  guarantee  shock  capturing  without  spurious  oscilla¬ 
tions.  Unfortunately,  we  find  that  the  two  flux  vector 
split  approaches  described  above  should  not  be  com¬ 
bined  with  the  same  extr^lation  functions. 


The  AUSM  scheme  works  well  with  the  van  Albada 
limiter  function 


UL  =  Ui4 


+  e  A.  -t-  A^  +  b  j 
aI  +  A^  +  2e 


(7) 


where  A^  =  Uj^i-Uj  ,  A.  =  Ui-Uj_, 

We  extrapolate  the  primitive  variables  and  the  total  en¬ 
thalpy  using  equ.  (7).  Extrapolation  of  the  latter  quantity 
is  needed  in  the  energy  flux  in  order  to  allow  steady 
state  solutions  with  constant  energy.  Also,  the  parameter 
E  is  made  large  if  the  contravariant  velocity  is  smaller 
than  a  certain  fraction  of  the  speed  of  sound.  Doing  this, 
clippmg  within  boundary  layers  and  false  interpolation 
values  of  the  contravariant  velocity  components  are 
avoided.  Typical  results  of  limiter  applications  for  high 
Reynolds  number  viscous  flows  are  shown  in  Fig.  3. 
The  pressure  contours  at  the  rear  part  of  RAE  2822  air¬ 
foil  at  transonic  flow  conditions  shows  oscillations  near 
the  edge  of  the  turbulent  boundary  layer  if  limiting  of 
the  cartesian  velocity  components  is  applied  in  the  tradi¬ 
tional  manner.  These  oscillations  disappear  if  the  limit¬ 
ing  operator  is  switched  off  for  small  values  of  the  Mach 
number  in  the  contravariant  coordinate  direction.  More 
technical  details  of  the  limiter  can  be  found  in  Ref.  [5]. 
Unfortunately,  we  find  that  the  application  of  the  van 
Albada  limiter  with  the  HCUSP  scheme  yields  some 
preshock  oscillations.  Hence,  we  use  the  limiting  func¬ 
tion  described  in  [4]  for  the  HCUSP  scheme.  That  func¬ 
tion  has  also  been  extended  to  avoid  limiting  in  low 
Mach  number  regions,  by  which  the  accuracy  of  the  re¬ 


sults  is  generally  improved. 

Finally,  it  is  necessary  to  add  controlled  artificial  dissi¬ 
pation  in  flow  regions  where  the  damping  characteristics 
of  the  basic  scheme  are  too  bad  in  order  to  allow  proper 
convergence  to  the  steady  state  solutions.  Fig.  4  shows 
the  situation  of  a  cell  with  high  aspect  ratio  in  two  di¬ 
mensions.  In  this  case  transient  modes  in  the  direction  of 
the  short  cell  edge  will  be  well  damped  by  an  explicit 
time  integration  method  whereas  modes  along  the  long 
side  of  the  cell  remain  almost  undamped.  This  problem 
can  be  solved  by  a  modification  of  the  advection  func¬ 
tion  (AUSM  scheme). 


F'  =  i|S| 


M, 


L/R 


sum 

advection 


-O 


diff 

advection 


where  O  is  a  function  [5]  of  the  spectral  radii,  X,  in  the 
coordinate  directions  i  and  j  so  that 

=  K/r|  forXj»Xj 
0  =  5  for  «  Xj 

Typical  values  of  5  used  in  the  present  work  are  5=1/4. 
This  adaptive  dissipation  formulation  makes  sure  that 
boundary  layers  are  not  numerically  smeared  but  there 
is  sufficient  damping  of  modes  in  the  direction  of  long 
cell  sides.  A  similar  formulation  has  been  implemented 
into  the  HCUSP  scheme. 

The  c^abilities  of  the  present  discretization  schemes 
for  perfect  gas  flows  with  shocks  and  shear  layers  are 
assessed  by  computations  of  transonic  and  hypersonic 
two-dimensional  flows.  Fig.  5  compares  distributions  of 
pressure  coeffieient,  total  pressure  loss  and  grid  conver¬ 
gence  of  the  aerodynamic  coefficients  for  transonic  in- 
viscid  flow  over  NACA  0012  airfoil.  AUSM"^  and 
HCUSP  yield  comparable  shock  resolution  whereas  the 
hybrid  AUSM  spears  to  be  more  dissipative  at  the 
shock.  The  HCUSP  scheme  generates  more  entropy  at 
the  leading  edge  and  lift  and  drag  values  converge 
somewhat  slower  with  grid  density  as  compared  to 
AUSM.  On  the  other  hand  HCUSP  is  more  rapid  with 
respect  to  the  residual  convergence  as  compared  to 
AUSM.  Typical  convergence  rates  of  the  multigrid 
method  described  below  are  0.90  for  HCUSP  and  0.94 
for  AUSM. 

The  resolution  of  very  strong  shocks  and  hypersonic 
shear  layers  is  shown  in  Fig.  6.  The  Mach  number  con¬ 
tours  obtained  for  inviscid  flow  around  a  blunted  wedge 
demonstrate  almost  perfect  shock  capturing  within  one 
cell  for  AUSM"^  and  HCUSP  whereas  hybrid  AUSM 
needs  one  interior  point  for  this  case.  The  resolution  of 
the  thermal  boundary  layer  which  is  displayed  on  the 
right  part  of  Fig.  6  is  similar  for  HCUSP  and  AUSM. 
We  note  that  both  schemes  give  much  better  results 
compared  to  a  conventional  central-difference  scheme 
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with  a  single  scalar  viscosity  (not  shown  here). 


4.  OPERATOR  SPLITTING  AND  IMPLICIT 
TREATMENT  OF  THE  CHEMICAL  SOURCE 
TERMS 

For  flows  with  nonequilibrium  chemistry  additional 
conservation  equations  with  chemical  source  terms  oc¬ 
cur,  which  render  the  system  of  equation  stiff  if  the  time 
scale  of  the  chemical  reactions  is  significantly  smaller 
than  the  fluid  mechanics  time  scale.  A  simplified  form 
of  the  conservation  equation  is  given  by 

l-W  =  -F-i-S 


W=  (p,  pu,  pv,  pw,  pE,  p2, 

S  =  (0, 0, 0, 0,  0,  Sj,  ,  F=disa.  flux 


The  full  set  of  equations  used  for  reacting  flow  is  given 
in  Refs.  [11, 12].  In  order  to  overcome  the  time  step  lim¬ 
itations  due  to  small  chemical  time  scales  we  employ 
implicit  discretization  of  the  source  terms, 

AW  -n  .  1 

=  -F  -i-S 
At 

Using  a  linearization  of  the  source  term  at  time  level  (n) 
one  obtains  a  point-implicit  update  of  the  solution  vec¬ 
tor  W  for  the  time  level  (n-i-1). 


AW 

At 


I-At.^ 

aw 


[-F"  +  S"] 


(8) 


The  Jacobian  matrix  has  no  entries  in  the  first  five  rows 
because  these  equations  have  no  source  terms.  Hence, 
the  update  of  equ.  (8)  can  be  broken  up  into^a  fully  ex¬ 
plicit  update  forWj  =  (p,  pu,  pv,  pw,  pE)  followed 
by  a  point-implicit  update  forW2=  (P2...Pn)  which 
involves  solution  of  (n-1)  linear  system  for  each  grid 
point. 

The  evaluations  of  the  explicit  source  vector,  S",  the  ele¬ 
ments  of  the  flux  Jacobian,  as/awj^^,  and  the  solution 
of  the  linear  system  usually  take  much  more  computer 
time  than  the  remaining  elements  of  the  solution  algo¬ 
rithm.  For  multistage  time  stepping  schemes  the  linear¬ 
ization  of  the  source  vector  around  the  old  time  level  is 
appropriate  [13]  and  hence,  the  derivatives  3S/3W  can 
be  held  constant  during  all  stages. 


5.  MULTIGRID  METHOD  FOR  HIGH  SPEED 
FLOWS 

Explicit  multistage  time-stepping  schemes  are  used  for 
advancing  the  solution  in  time.  Choosing  the  number  of 
stages  and  the  stage  coefficients  allows  an  optimization 
of  the  high-frequency  damping  properties  of  the  scheme 
at  relatively  high  Courant  numbers.  Hence,  these 


schemes  can  be  combined  with  multigrid  algorithms  in 
order  to  accelerate  convergence  to  steady-state,  accord¬ 
ing  to  Ref.  [8]. 

Coarse  meshes  for  the  multigrid  are  obtained  eliminat¬ 
ing  alternate  points  in  each  coordinate  direction.  Both 
the  solution  and  the  residuals  are  restricted  from  fine  to 
coarse  meshes.  A  forcing  function  is  constructed  so  that 
the  solution  on  a  coarse  mesh  is  driven  by  residuals  col¬ 
lected  on  the  next  finer  mesh.  The  corrections  obtained 
on  the  coarse  mesh  are  interpolated  back  to  the  fine 
mesh.  This  multigrid  scheme  is  now  widely  used  in  the 
CFD  community  and  it  works  quite  well  for  a  wide 
range  of  subsonic  and  transonic  flow  problems. 

However,  a  number  of  modest  modifications  of  the  orig¬ 
inal  multigrid  scheme  are  necessary  for  high  Mach  num¬ 
ber  flows  with  strong  shocks  and  strong  variations  of 
viscosity  and  conductivity  coefficients.  We  employ  a 
special  set  of  Runge-Kutta  coefficients  which  are  opti¬ 
mized  for  damping  with  upwind  discretization  and  re¬ 
sidual  smoothing  [14].  Courant  numbers  of  about  5  are 
used  in  the  present  work  which  is  about  twice  the  ex¬ 
plicit  stability  limit.  Strong  variations  of  viscosity  and 
conductivity  occur  in  hypersonic  viscous  flows.  Typical 
time  scales  of  the  viscous  diffusion  process  may  be 
much  smaller  than  the  convection  time  scale  which  puts 
a  severe  restriction  on  the  time  step  if  purely  explicit 
time  integration  is  sought.  This  problem  may  be  circum¬ 
vented  by  locally  adjusting  the  coefficient  of  the  implicit 
residual  smoothing  scheme  [15],  such  that  the  original 
time  step  based  upon  the  inviscid  flux  vector  is  recov¬ 
ered, 

where  denotes  the  spectral  radius  of  the  inviscid  flux 
Jacobian  in  the  ^-coordinate  direction  and  V  is  the  cell 
volume.  At  strong  shocks  large  Courant  numbers  ob¬ 
tained  with  the  help  of  residual  smoothing  will  result  in 
solution  divergence.  Therefore  an  adaptive  time  step  is 
employed  such  that  the  Courant  number  is  reduced  to 
about  1  at  strong  shocks  [14]. 

The  multigrid  scheme  involves  restriction  and  prolonga¬ 
tion  operators  which  are  both  modified  for  hypersonic 
flows.  At  strong  shocks  the  restriction  of  residuals  from 
coarse  to  fine  meshes  is  damped  by  using  the  second  dif¬ 
ference  of  the  pressure  as  a  sensor  in  order  to  reduce  the 
coarse-mesh  corrections  in  that  region. 

We  have  also  observed  that  the  coarse  meshes  can  pro¬ 
mote  upstream  propagation  of  transient  modes  if  central 
interpolation  is  used  for  prolongation  of  the  corrections. 
This  problem  is  resolved  by  using  an  upwind  biased  in¬ 
terpolation  of  the  corrections  where  the  Mach  number  in 
the  contravariant  coordinate  direction  is  used  to  define 
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the  bias  [16]. 

6.  NUMERICAL  RESULTS  FOR  HYPERSONIC 
REACTING  FLOWS 

Comparisons  between  the  different  flux  vector  splitting 
variants  have  been  presented  for  2D  calorically  perfect 
gas  flows  within  Chapter  2.  Our  experiences  for  reacting 
flows  and  for  complex  3D  flows  are  based  upon  the  hy¬ 
brid  AUSM  scheme,  until  now.  The  hybrid  AUSM  spa¬ 
tial  discretization  described  in  Chapter  2,  the  implicit 
source  treatment  given  in  Chapter  3,  and  the  multigrid 
elements  of  Chapter  4  are  implemented  into  the  3D 
DLR  multiblock  code  CEVCATS  [17,  5].  The  imple¬ 
mentation  of  the  thermodynamic  model,  the  chemical 
reactions  and  the  viscous  terms  are  general  such  that  any 
chemistry  model  can  be  employed  without  modifica¬ 
tions  in  the  source  code.  Moreover,  the  code  runs  effec¬ 
tively  on  vector  computer  by  vectorization  over  all  grid 
points  within  a  block  of  the  computational  domain.  In¬ 
ner  loops  containing  the  number  of  species  or  the  num¬ 
ber  of  reactions  are  unrolled  by  compiler  directives  [12]. 
Hence,  we  have  obtained  a  computational  speed  of 
about  1500  MFLOP/s  on  a  single  processor  of  NEC- 
SX3  computer.  This  corresponds  to  50  |J.s  computing 
time  for  the  update  of  a  single  grid  point  by  one  multi¬ 
grid  cycle  and  assuming  a  reacting  gas  mixture  of  five 
species  with  17  chemical  reactions.  The  use  of  point  im¬ 
plicit  operators  and  multigrid  for  reacting  flows  was  also 
investigated  with  the  help  of  a  quasi  ID  code  for  nozzle 
flows  which  contains  the  algorithmic  elements  pre¬ 
sented  in  the  previous  chapters. 

The  capabilities  of  the  multigrid  method  for  reacting 
flows  with  large  contrast  are  investigated  by  applica¬ 
tions  for  one-,  two-  and  three-dimensional  flows.  At 
first,  we  have  chosen  inviscid  transonic  reacting  flow  in 
a  diverging  nozzle  in  order  to  demonstrate  the  effects  of 
point-implicit  time  stepping  and  multigrid  acceleration 
separately.  Fig.  7  displays  the  distributions  of  tempera¬ 
ture  and  the  concentrations  of  the  three  species  present 
in  the  flow.  The  dissociation  reaction  rate  coefficients 
for  oxygen  have  been  chosen  such  that  strong  reactions 
take  place  for  temperatures  above  1000  K.  Hence,  the 
flow  simulation  represents  shock  induced  dissociation  at 
reentry  flow  conditions.  The  dissociation  time  scale  is 
small  enough  so  that  explicit  time  stepping  alone  does 
not  yield  a  converged  flow  solution  within  several  thou¬ 
sand  time  steps.  With  point-implicit  time  stepping  the 
code  converges  slowly  to  the  steady  state.  Convergence 
is  noticeably  accelerated  by  application  of  4-level  multi¬ 
grid,  by  which  a  convergence  rate  per  multigrid  cycle  of 
about  0.95  is  realized. 

The  second  application  is  the  viscous  reacting  flow  over 
a  2D  cylinder  which  is  displayed  in  Fig.  8.  This  case  has 


been  used  to  check  the  accuracy  of  reacting  gas  chemis¬ 
try  and  thermodynamics  into  the  3D  code  CEVCATS  by 
grid  convergence  studies  and  comparisons  with  other 
available  codes  (not  shown  here).  It  is  found  that  both 
shock  layer  and  wall  heat  fluxes  are  well  predicted  with 
relatively  small  numbers  of  grid  points.  The  conver¬ 
gence  histories  plotted  in  Fig.  9  indicate  again  that  mul¬ 
tigrid  is  effective  for  reacting  flow  problems. 

Finally,  we  present  numerical  results  for  a  complex  3D 
case  in  Figs.  10-14.  The  configuration  is  called  HALIS 
and  it  represents  the  windward  side  of  the  US-Orbiter 
vehicle.  Extensive  numerical  and  experimental  data  is 
available  for  this  configuration.  A  first  set  of  computa¬ 
tions  has  been  executed  for  wind  tunnel  conditions  at 
Mach=10.  These  computations  include  the  forebody,  de¬ 
flected  body  flaps  and  the  base  flow  behind  the  rear  of 
the  vehicle.  The  computations  were  done  on  a  mesh 
with  2.6  million  grid  points.  Additionally,  local  grid  re¬ 
finement  was  investigated  in  the  separation  region 
around  the  deflected  body  flap.  The  numerical  solutions 
have  been  extensively  studied  with  respect  to  grid  con¬ 
vergence  and  the  solutions  compare  very  well  with  wind 
tunnel  measurements.  The  computation  is  a  significant 
accomplishment  because  of  the  large  regions  with  flow 
separation  present. 

The  second  computation  was  done  for  a  flight  trajectory 
point  at  Mach=24,  72  km  altitude  and  assuming  air  in 
chemical  nonequilibrium  (see  Figs.  12,  13).  Until  now, 
numerical  results  are  obtained  for  the  forebody  of  HA¬ 
LIS  only.  The  grid  convergence  studies  and  the  compar¬ 
ison  with  existing  numerical  data  [18]  indicates  that  the 
solution  resolves  the  relevant  flow  phenomena  properly. 
The  residual  convergence  is  displayed  in  Fig.  14  for 
both  the  nonreacting  and  the  reacting  flow  cases.  It  is 
seen  that  the  rate  of  convergence  is  approximately  the 
same  for  both  conditions.  The  computation  of  400  mul¬ 
tigrid  cycles  for  HALIS  forebody  with  1.3  million  grid 
points  took  6  hours  on  NEC-SX3  computer.  Hence,  it  is 
concluded  that  converged  flow  solutions  for  reacting 
flows  over  complex  configurations  are  now  feasible. 
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Abstract 

The  purpose  of  this  paper  is  to  present  an  hybrid  ui>- 
wind  splitting  method  fully  adapted  to  viscous  chem¬ 
ical  and  thermal  nonequilibrium  flows.  Such  flows  are 
the  site  of  strong  viscous-inviscid  interactions  and  are 
dominated  by  real  gas  effects  due  to  dissociation  and 
internal  mode  excitation.  Furthermore,  the  hyper¬ 
velocities  along  the  reentry  trajectory  induce  a  large 
degree  of  thermo-chemical  nonequilibrium.  ONERA 
has  developed  a  code  for  simulating  such  flows:  the 
code  CELHYO.  Detailed  works  concerning  the  physi¬ 
cal  modeling  having  already  been  presented  in  previ¬ 
ous  papers  [4]  [12],  emphasis  is  put  here  on  the  nu¬ 
merical  method,  and  particularly  on  the  extension  of 
hybrid  upwind  splitting  methods  to  nonequilibrium 
flows.  The  hybrid  upwinding  is  achieved  by  combining 
the  basically  distinct  Flux  Vector  and  Flux  Difference 
Splitting  approaches  in  retaining  their  own  interesting 
features.  The  hybrid  method  implemented  in  the  code 
CELHYO  heis  been  obtained  by  hybridizing  the  Osher 
approach  with  the  van  Leer  scheme.  In  order  to  illus¬ 
trate  the  numerical  method,  internal  and  external  flow 
configurations  are  presented. 

Resume 

On  presente  ici  des  methodes  numeriques  adaptees 
a  la  prediction  d’ecoulements  visqueux  en  desequili- 
bre  thermodynamique  et  chimique.  Get  article  con- 
cerne  en  particulier  le  developpement  de  schemas 
decentres  bien  adaptes  a  I’approximation  des  flux 
de  fluide  parfait  dans  le  contexte  des  problemes 
d’ecoulements  visqueux  hyperenthalpiques.  Initiale- 
ment,  I’algorithme  de  traitement  des  flux  de  fluide 
parfait  etait  devolu  a  la  methode  de  Roe.  Du  fait 
de  la  mise  en  oeuvre  delicate  de  cette  methode  dans  le 
cadre  des  ecoulements  visqueux,  une  nouvelle  approche 
pour  le  decentrement  est  proposee  [8].  Elle  combine 
les  deux  approches  classiques,  I’approche  de  decompo¬ 


sition  de  flux  et  I’approche  de  type  Godunov.  Nous 
soulignerons  les  principales  etapes  qui  la  composent  et 
nous  ne  decrirons  dans  le  present  papier  qu’une  tech¬ 
nique  d’hybridation  particuliere  basee  sur  la  methode 
d’Osher  et  la  methode  de  van  Leer.  Afin  d’illustrer 
la  methode  numerique  utilisee,  des  resultats  de  calculs 
pour  des  configurations  d’ecoulements  externes  et  in¬ 
ternes  sont  presentees. 

1.  Introduction 

Nous  nous  interessons  a  la  resolution  du  systeme 
gouvernant  les  ecoulements  de  gaz  en  desequilibre  ther- 
mique  et  chimique.  De  tels  ecoulements  se  produisent 
lors  de  la  rentree  dans  I’atmosphere  d’un  corps  ou  d’un 
vehicule  hypersonique.  A  ces  vitesses,  I’ecoulement 
atteint  de  tres  hautes  temperatures  pres  du  vehicule. 
Ces  temperatures  sont  suffisamment  importantes  pour 
induire  des  effets  de  gaz  reels  complexes  comme  la 
dissociation  de  Fair,  la  relaxation  vibrationnelle  et 
eventuellement  I’ionisation.  L’  ONERA  a  developpe 
un  code  de  calcul  simulant  numeriquement  de  tels 
ecoulements:  le  code  CELHYO.  Des  etudes  detail- 
lees  relatives  a  la  modelisation  physique  ayant  deja 
fait  I’objet  de  plusieurs  articles  [4]  [12],  nous  nous 
attachons  ici  aux  travaux  effectues  dans  le  domaine 
numerique,  et  plus  particulierement  a  I’extension  des 
schemas  hybrides  au  cas  d’ecoulements  en  desequilibre. 
La  motivation  de  I’utilisation  de  tels  schemas  repond 
au  souci  de  porter  le  code  a  un  niveau  de  robustesse 
mais  egalement  de  precision  necessaire  a  la  simulation 
d’ecoulements  plus  complexes  correspondant  par  ex- 
emple  aux  ecoulements  ionises. 

Initialement,  I’algorithme  de  traitement  des  flux  de  flu¬ 
ide  parfait  etait  devolu  a  la  methode  de  Roe.  Du  fait 
de  la  mise  en  oeuvre  delicate  de  cette  methode  dans 
le  cadre  des  ecoulements  visqueux,  une  nouvelle  ap¬ 
proche  pour  le  decentrement  est  proposee  [8].  Elle 
combine  les  deux  approches  classiques,  I’approche  de 
decomposition  de  flux  et  I’approche  de  type  Godunov. 
L ’approche  de  decomposition  de  flux  conduit  a  pro¬ 
poser  des  approximations  simples  se  revelant  etre  tres 
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robustes  dans  la  pratique  mais  presentant  le  principal 
defaut  d’lgnorer  la  structure  de  la  solution  relaxee,  en 
particulier  les  ondes  lineaires  (discontinuites  de  con¬ 
tact)  qui  la  composent.  Le  respect  des  ondes  lineaires 
est  crucial  dans  le  cadre  des  ecoulements  visqueux 
et  sa  violation  rend  les  methodes  de  decomposition 
de  flux  inappropriees  a  leur  simulation.  L’approche 
de  type  Godunov  permet  de  satisfaire  cette  exigence 
moyennant  une  complexite  accrue  de  I’approximation. 
Mais  elle  trouve  un  avantage  decisif  sur  Tapproche 
de  decomposition  de  flux  dans  le  cadre  des  problemes 
visqueux.  Toutefois,  ces  methodes  peuvent  presenter 
divers  defauts  de  stabilite  dans  la  capture  des  ondes 
non-lineaires  (ondes  de  choc  et  de  detente). 

L’approche  du  decentrement  par  hybridation  (meth¬ 
odes  HUS  "Hybrid  Upwind  Splitting”)  combine  les 
deux  approches  precedemment  citees  de  maniere  a  n’en 
retenir  que  les  proprietes  jugees  idoines  pour  la  sim¬ 
ulation  des  ecoulements  visqueux  hyperenthalpiques. 
Dans  le  code  de  calcul,  c’est  le  schema  decentre  resul¬ 
tant  de  1’  hybridation  du  schema  d’Osher  et  de  celui 
de  van  Leer  qui  a  ete  implante.  D’autre  part  ont  egale- 
ment  ete  implantees  dans  le  code  la  methode  d’Osher 
et  une  methode  de  type  van  Leer. 

2.  Modelisation  et  equations  de  bilan 

Dans  cette  etude,  nous  considerons  un  melange  ideal 
de  gaz  parfaits  constitue  de  ns  especes  dont  nm  especes 
moleculaires.  Dans  le  cas  de  I’air,  les  cinq  especes  prin- 
cipales  N2,  O2,  NO,  N  ct  O  seront  prises  en  compte. 
Les  modes  de  translation  et  de  rotation,  et  le  mode 
electronique  sont  toujours  consideres  a  I’equilibre  et 
sont  done  caracterises  par  une  temperature  unique  T 
alors  que  les  modes  de  vibration  peuvent  s’ecarter  de 
I’equilibre.  Nous  supposons  que  parmi  les  nm  es¬ 
peces  moleculaires,  nv,  nv  <  nm,  d’entre  elles  ont 
leurs  modes  de  vibration  en  desequilibre  (N2,  02  et 
eventuellement  NO  pour  I’air).  Nous  nous  interessons 
aux  evolutions  bidimensionnelles  de  ce  melange.  Ces 
evolutions  sont  gouvernees  par  le  systeme  de  lois  de 
conservation  suivant: 

3tu  -I-  dlv(f(u)  —  X>(u)gradu)  =  fl.  (1) 

f  designe  les  flux  de  fluide  parfait.  Les  phenomenes 
dissipatifs  sont  ici  modelises  par  le  tenseur  V.  Le 
terme  source  fl  traduit  la  presence  des  phenomenes 
de  desequilibre.  Dans  la  suite,  U  ouvert  de  71^  avec 
p  =  ns  +  nv  3  designe  I’espace  des  etats.L’inconnue 
u  :  TZ'*'  X  72^  — *  U  a  pour  expression; 


oil  E  est  I’energie  totale  du  melange  par  unite  de  masse 
et  V  =  (111,712)  la  vitesse  barycentrique  du  melange. 

designe  1’  energie  de  vibration  par  unite  de  masse 
de  I’espece  moleculaire  /3. 

La  pression  du  gaz  est  donnee  par  la  loi  de  Dalton: 


oil  Rgp  designe  la  constante  universelle  des  gaz  parfaits 
et  Ma  est  la  masse  atomique  de  I’espece  a. 

A  ce  systeme  est  associee  une  relation  de  fermeture 
thermodynamique  generate  telle  que  la  pression  du 
melange  verifie: 

p  =  Ktr  {E  -  ppe„.p 

P 

a 

oil  Kijr  =  7tr  -  1-  Co  et  /i®  designent  respectivement 
1’  energie  des  modes  internes  a  I’equilibre  avec  la  tem¬ 
perature  de  translation  et  la  chaleur  de  formation  de 
I’espece  a  par  unite  de  masse. 

Les  expressions  detaillees  des  termes  source  et  du 
tenseur  des  phenomenes  dissipatifs  peuvent  etre  trou- 
vees  dans  de  precedents  articles  [4],  [12].  Nous 
rappellerons  seulement  que  le  modele  de  chimie 
choisi  est  celui  de  Gardiner  [9].  II  met  en  oeuvre 
17  reactions  comprenant  quinze  reactions  de  dis¬ 
sociation  et  deux  reactions  d’echange.  Les  equa¬ 
tions  pour  les  energies  de  vibration  peuvent  inclure 
les  echanges  d’energie  Translation- Vibration  (T-V), 
Vibration- Vibration  (V-V)  ou  Vibration- Dissociation 
(V-D).  Le  taux  d’echange  d’energie  T-V  est  modelise 
par  un  modele  de  Landau-Teller,  les  temps  de  re- 
leixation  entre  especes  etant  donnes  par  la  loi  semi- 
empirique  de  Millikan  et  White  [14]. 

Le  tenseur  des  contraintes  visqueuses  utilise  pour  la 
viscosite  du  melange  le  modele  d’Armaly  et  Sutton 
[2],  la  viscosite  de  chaque  espece  etant  determinee  par 
la  relation  de  Blottner  [3].  La  vitesse  de  diffusion  des 
especes  verifie  une  loi  de  Fick  et  un  coefficient  de  dif¬ 
fusion  binaire.  Les  flux  de  chaleur  du  melange  et  de 
vibration  sont  supposes  suivre  des  lois  de  Fourier.  Le 
detail  des  expressions  des  coefficients  de  conductivite 
thermique  est  donne  dans  [4]. 

3.  Methode  numerique 


=  (  (pa)i<o<ni,  pvi,  PV2,  pE,  {ppe„.p)i<p<nv) ,  Les  solutions  du  systeme  convectif-dissipatif  (1)  sont 

(2)  approchees  par  une  methode  de  volumes  finis  implicite 
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ecrite  pour  des  maillages  curvilignes.  Cette  methode 
est  decrite  ci-dessous,  tout  d’abord  dans  sa  formulation 
semi-discrete  en  espace  puis  dans  sa  formulation  im- 
plicite  en  temps.  Seules  seront  discutees  ici  les  meth- 
odes  numeriques  ayant  trait  a  I’approximation  du  sys- 
teme  Euler  extrait.  La  discretisation  de  I’operateur 
du  second  ordre  traduisant  les  phenomenes  dissipat- 
ifs  s’appuit  sur  une  methode  centree,  deja  presentee  a 
I’occasion  d’une  precedente  contribution  a  I’AGARD 
[12].  Les  methodes  associees  au  systeme  Euler  et  im- 
plantees  dans  le  code  CELHYO,  font  appel  a  diverses 
techniques  de  decentrement.  Elies  relevent  respec- 
tivement  du  decentrement  par  resolution  approchee 
du  probleme  de  Riemann,  du  decentrement  par  de¬ 
composition  de  flux  et  enfin  d’une  technique  origi- 
nale  qualifiee  de  decentrement  par  hybridation  champ 
par  champ.  Cette  derniere  technique  resulte  d’une 
recherche  menee  en  collaboration  entre  I’ONERA  et 
la  NASA  [8].  Nous  decrivons  ci-apres  trois  des  sche¬ 
mas  decentres  utilises  dans  le  code:  le  solveur  de  Rie¬ 
mann  approche  d’Osher-Solomon,  la  decomposition  de 
flux  de  van  Leer  et  enfln  le  schema  HUS  resultant  de 
I’hybridation  des  deux  precedentes  methodes.  Le  code 
CELHYO  dispose  egalement  d’autres  schemeis,  en  par- 
ticulier  ceux  de  Roe,  de  Godunov  et  de  Collela-Glaz, 
que  nous  ne  rapporterons  pas  ici. 

3.1  Methodes  de  volumes  finis  bidimensionnels 
L’approximation  numerique  de  I’inconnue  u  du  sys¬ 
teme  est  obtenue  a  I’aide  d’une  methode  de  volumes 
finis  dont  la  formulation  continue  en  temps  s’ecrit,  en 
omettant  les  termes  source  et  les  phenomenes  dissipat- 
ifs,  pour  une  cellule  K  de  frontiere  dK: 

(ujf)t  +  H  /(ujr,ujc.;nA-.)|el  =  0,  (5) 

ou  ujf  (respectivement  ujc,)  designe  la  valeur  con- 
stante  de  la  solution  approchee  sur  la  cellule  K  (re¬ 
spectivement  Kt).  Par  definition,  la  cellule  voisine 
Kf  possede  I’arete  commune  e;  iik,c  cst  la  normale 
unitaire  a  e  exterieure  a  I’element  K.  L’application 
/  :  U  xU  X  ►  TZ^  designe  un  flux  numerique  bidi- 
mensionnel  cistreint  aux  conditions  de  conservativite  et 
de  consistance  usuelles.  En  vertu  de  1 ’invariance  par 
rotation  des  equations  d’Euler,  I’evaluation  des  flux 
numeriques  bidimensionnels  est  deduite  de  I’evaluation 
d’un  flux  numerique  consistant  avec  un  probleme  de 
Riemann  monodimensionnel  ou  le  flux  exact  s’ecrit: 

fn  =  (  (PaVl)l<a<n.,  pUi^-fp,  pViV2, 

nv  ).  (6) 

avec  Tj  le  premier  vecteur  de  la  base  canonique  deTS*. 


Considerons  /(ujf,  Ujf.; Ti)  un  flux  numerique  consis¬ 
tant  avec  le  flux  exact  fr, .  En  introduisant  la  rotation 
envoyant  le  vecteur  de  base  ii  sur  la  normale 
nous  definissons  I’operateur  de  rotation  qui  nous 
permet  d’obtenir  un  flux  numerique  bidimensionnel  en 
posant: 

/(uK)  UJT.;  =  TK,t~^f{TK,eUK,TK,t'ilK.',Tl)- 

(7) 

Dans  la  suite,  nous  remplacerons  abusivement  fr,  par 
f ,  ceci  afin  d’alleger  les  notations. 


3.2  Principales  proprletes  du  systeme 
Sous  des  hypotheses  thermodynamiques  generales,  le 
systeme  est  hyperbolique.  La  matrice  jacobienne  as- 
sociee,  notee  Vf(u),  est  diagonalisable  et  possede  p 
valeurs  propres  A*,  1  <  A:  <  p  dont  deux  valeurs  pro- 
pres  simples  Ai  =  ui  —  c.  A,,  =  ui  -t-  c  et  une  valeur 
propre  multiple  Ai  =  ui,  2  <  A:  <  p  —  1.  Ici,  c  = 
designe  la  vitesse  du  son. 

Dans  le  cas  d’une  thermodynamique  generale,  les  in¬ 
variants  de  Riemann  associes  aux  1  et  p-champs  vrai- 
ment  nonlineaires  ne  sont  pas  explicitement  connus.  Ils 
sont  donnes  par  les  equations  differentielles  suivantes: 


dYa  = 

=  0 

1  <  a  <  ns  —  1 

(8) 

d  Cvj/3 

=  0 

1  <  a  <  nu 

(9) 

dp  T 

(10) 

d  V  ± 

=  0 

cp 

(11) 

Lorsque  la  relation  de  fermeture  thermodynamique 
n’est  pas  une  function  lineaire  de  la  temperature  de 
translation,  j  depend  de  la  temperature  et  les  equa¬ 
tions  (10)  et  (11)  ne  peuvent  etre  facilement  integrees. 
Dans  ce  travail,  nous  proposons  d’integrer  de  maniere 
approchee  les  relations  (10)  et  (11)  en  negligeant  la 
dependance  en  temperature  de  7  pour  obtenir  les  in¬ 
variants  suivants: 


(Ya)  l<a<rn)  '^1  i 


2c  p 

(7=1)V’ 


V2, 


{Ypey.p)i<p< 

nv ) 


....  (12) 

Concernant  les  k-champs  lineairement  degeneres,  les 
invariants  sont  vi  et  p. 


3.3  Methode  d’Osher-Solomon 
Cette  methode  repose  sur  la  resolution  approchee 
du  probleme  de  Riemann  obtenue  en  assimilant  chaque 
onde  simple  a  une  onde  de  detente-compression.  Elle 
est  ainsi  definie  par  la  construction  d’un  chemin  dans 
1 ’espace  des  etats  reliant  U£,  a  uj?  et  obtenu  en  suiv- 
ant  dans  le  plan  vitesse-pression  les  parties  admissible 
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et  non  admissible  des  courbes  de  detentes  issues  de 
ces  deux  etats.  L’ordre  de  parcours  des  courbes  de 
detente  retenu  dans  ce  travail  correspond  a  I’ordre  na- 
turel  {v  -  c,v,v  +  c).  Un  tel  chemin,  note  $(ux,,  ur) 
dans  la  suite,  exlste  sous  des  conditions  thermody- 
namiques  generales  tant  qu’il  n’y  a  pas  cavitation.  II 
est  compose  de  deux  sous-chemins  de  type  vraiment 
nonlineaire  (VNL)  notes  et  4>3  et  d’un  sous  chemin, 
^>2,  de  type  lineairement  degenere  (LD)  associe  a  la 
discontinuite  de  contact.  Ce  chemin  une  fois  construit 
permet  de  definir  completemen'  flux  numerique  as- 
'-ie  au  schema  d’Osher-^^  'om  .  seion  : 


L ’application  p^('u)  -Pl(v)  est  strictement  croissante 
et  admet  au  plus  une  racine,  notee  v*.  Afln  de  calculer 
cette  racine,  il  est  utile  d’introduire  vl  et  v'r  definies 
par 

,  7L  -  1  -  7i?  -  1  . 

VL  -VL-\-  - ^ - CL,  VR-VR - - - CR, 

et  de  remarquer  [5]  que  v*  s’exprime  comme  combinai- 
son  convexe  de  ces  deux  vitesses  particulieres.  II  existe 
done  un  reel  z*  G  [0, 1]  tel  que 

V*  =  Z*VL  +  (1  -  Z*)VR.  (20) 


f°^{UL,UR)  =  i(^f(uL)  +  f(uij) 


-  [  |V,,f(u)|<iu).  (13) 

Ji{UL,UR) 

Nous  renvoyons  a  [5]  pour  I’ecriture  detaillee  de  ce  flux. 
Nous  nous  consacrons  ici  a  I’expose  de  I’algorithme 
de  construction  du  chemin  $(ux,,Ufi)  que  nous  avons 
associe  au  melange  de  gaz  qui  nous  interesse.  Nous 
renvoyons  a  Abgrall  et  Coll.  [1]  pour  un  autre  procede. 

Designons  par  et  Ujj  les  etats  separes  par  la  dis¬ 
continuite  de  contact  se  propageant  a  la  vitesse  v* . 
Ces  etats  sont  construits  en  resolvant  le  probleme  suiv- 
ant,  exprimant  la  conservation  des  invariants  de  Rie- 
mann  et  la  continuite  de  la  pression  et  de  la  vitesse  a 
la  traversee  de  la  discontinuite  de  contact. 


Ya\l 

=  Yc.\l, 

P* 

_ 

pV‘' 

<11* 

V  “T 

7L-1  ^ 

7J*  — 

•y  — 

7fl  -  1 

cl  -VL  + 


— 

ppTIR 

2 

- c 

7X  -  1 

2 

- c 

7fl  -  1 


(16) 

cfl,  (17) 


Notons  que  v~l  —  v~r  >  0  sauf  precisement  lorsqu’il 
y  a  cavitation.  En  utilisant  (20),  le  probleme  de  la 
recherche  de  la  racine  de  I’equation  (19)  peut  alors 
etre  reformule  en  ces  termes.  Trouver  le  reel  z*  G  [0, 1] 
solution  de  I’equation  : 

CL,fi(l  -  z)^^ -z^iT^  =  0,  (21) 

oil  nous  avons  pose 

(2cl/(7l-1))"^-^ 

Lorsque  7i  'fR,  I’equation  precedente  n’admet  pas 
de  racine  explicite,  sa  determination  necessite  la  mise 
en  oeuvre  d’une  procedure  iterative  de  type  Newton. 
Afin  d’en  optimiser  la  vitesse  de  convergence,  nous  pro- 
posons  de  substituer  a  la  resolution  du  probleme  (21) 
celle  du  probleme  equivalent  suivant,  presentant  un 
tres  bon  conditionnement.  Trouver  le  reel  z*  G  [0,1] 
solution  de  I’equation  g{z)  =  0  ou: 


g{z)  = 


=  IjjCv,  ^U,  Ypey^  p\*n  =  Ypey,  p\r.  (18) 

II  est  aise  de  voir  que  la  resolution  du  precedent  prob¬ 
leme  peut  etre  ramenee  a  la  recherche  de  v*,  solution 
de  I’equation 

Pfl(^) -?!(■")  =  0>  (^®) 

traduisant  la  continuite  de  la  pression  et  de  la  vitesse 
a  la  discontinuite  de  contact.  Ce  probleme  une  fois 
resolu  conduit  a  la  determination  des  autres  quantites. 
Ici,  nous  avons  : 

Pfl(^)  =  ^  ^  tr-i  ^  V<VR, 

p1(-)  =  pl(i-^^)^. 


^L^R  ~  z)i’iC-'K-‘)  -  Z,  SI  Z  <  i 
-ri-CTn-i) 

(1  -  z)  -  ,  sinon 


Afin  d’initialiser  I’algorithme  de  Newton,  nous  definis- 
sons 


'~'L,R 

TB-l  > 


z,  =  Z2  =  (23) 

II  est  possible  de  verifier  que  la  plus  proche  valeur  de 
la  racine  z*  est  donnee  par 


{me 

mi 


max(zi,Z2),  si  <  1, 

min(zi,Z2),  sinon 


valeur  qui  sera  utilisee  comme  valeur  d’initialisation. 
Remarquons  que  dans  le  cas  7l  =  7ii,  Zinu  =  zi  =  zi 
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coincide  avec  la  racine  2*  de  I’equation  consideree. 
L’algorithme  (22)-(24)  converge  generalement  en  au 
plus  3  iterations  pour  un  test  d’arret  de  10“®  portant 
sur  I’erreur  relative  —  1|. 

3.4  Methode  de  decomposition  de  van  Leer 

L’extension  de  la  methode  de  van  Leer  aux  equa¬ 
tions  d’Euler  multi-especes  et  multi-temperatures  a 
fait  I’objet  de  quelques  travaux  (voir  en  particulier 
[11]).  Les  extensions  proposees  conduisent  a  une 
famille  de  schemeis  a  un  degre  de  liberte  parametrant 
la  decomposition  de  flux  d’energie.  Le  choix  d’une 
decomposition  de  flux  particuliere  doit  etre  opere  de 
maniere  a  assurer  que  les  matrices  jacobiennes  des  flux 
decomposes  ±V/^(u)  n’admettent  que  des  valeurs 
propres  reelles  positives  ou  nulles.  Toutefois,  il  ressort 
de  ces  travaux  qu’une  telle  propriety  est  difficile  a 
garantir  pour  tout  u  de  I’espace  des  etats  dans  le  cadre 
d’une  thermodynamique  non  lineaire  en  T.  Nous  avons 
privilegle  dans  le  code  CELHYO  le  representant  de 
la  famille  consideree  permettant  de  preserver  la  con- 
stance  de  I’enthalpie  totale  a  la  traversee  d’un  choc 
stationnaire.  Ce  schema,  brievement  decrit  ci-dessous, 
s’est  revele  entierement  satisfaisant  dans  les  applica¬ 
tions  pratiques. 

En  introduisant  le  nombre  de  Mach  M  —  v/c,  les 
flux  decomposes  se  reduisent  a  /■*"  (u)  =  f  (u),  f~  (u)  = 
0  lorsque  M  >  1  et  symetriquement  a  /"^(u)  = 


0, 

/-(u)  = 

f(u)  lorsque  M  <  —  1.  Pour  |Af| 

<  1. 

ces 

flux  sont 

definis  par  les  expressions  suivantes 

fi 

=  ±^(Af  ±  l)*PaC,  1  <  a  <  ns. 

(25) 

J  pvi 

=  (('± 

(26) 

fpvi 

(27) 

f± 

JpE 

(28) 

Jppt 

^p=ppe^,pff,  l</3<nn. 

(29) 

oil 

nous  avons  pose 

/?=  E  It- 

(30) 

l<a<n» 


Bien  que  fort  eloigne  de  la  methode  decentree  d’Osher- 
Solomon,  le  schema  de  van  Leer  peut  neanmoins  re- 
cevoir  une  formulation  analogue  en  terme  de  chemin. 
II  est  ainsi  possible  de  verifier  que  celui-ci  peut  s’ecrire 

-  /  (V/+(u)- V/-(u))tiu),  (31) 


et  ce  pour  n’importe  quel  chemin  #  connectant  ux, 
et  uj?  dans  I’espace  des  etats.  Cette  propriety  est  a 
la  base  de  la  technique  d’hybridation  des  methodes 
d’Osher-Solomon  et  de  van  Leer  [7]  dont  nous  pro- 
posons  I’extension  ci-dessous  au  cadre  des  melanges 
de  gaz  en  desequilibre  chimique  et  thermique. 

3.5  Methode  de  decentrement  par  hybridation 
champ  par  champ 

L’introduction  du  decentrement  par  hybridation  a  ete 
motivee  par  I’analyse  des  avantages  et  des  defauts  re- 
spectifs  aux  schemas  d’Osher-Solomon  et  de  van  Leer. 
Ainsi  si  la  methode  de  van  Leer  se  revele  etre  tres  ro- 
buste  dans  la  capture  des  ondes  non  lineaires  (choc 
et  detente),  elle  est  en  revanche  tres  peu  precise  dans 
la  resolution  des  ondes  lineaires  (discontinuite  de  con¬ 
tact).  Ce  manque  de  precision  la  rend  inappropriee 
dans  le  contexte  d’equations  de  fluides  visqueux  qui  est 
le  notre.  A  I’oppose,  la  methode  d’Osher-Solomon  au¬ 
torise  par  construction  la  resolution  exacte  des  discon- 
tinuites  de  contact  stationnaires.  Toutefois  cissocie  a 
cet  avantage,  cette  derniere  souffre  d’un  manque  de  ro- 
bustesse  dans  la  capture  d’ondes  nonlineaires  intenses. 
Les  avantages  et  les  defauts  inherents  aux  deux  raeth- 
odes  se  revelent  done  disjoints  et  complementaires. 

La  technique  d’hybridation  se  propose  de  tirer  parti 
d’une  telle  complementarite  avec  pour  but  d’associer 
la  robustesse  de  la  methode  de  van  Leer  dans  la  re¬ 
solution  des  ondes  non  lineaires  et  la  precision  du 
schema  d’Osher-Solomon  dans  la  resolution  des  ondes 
lineaires.  C’est  ainsi  que  chacun  des  trois  sous  chemins 
composant  le  chemin  d’Osher-Solomon  est  associe  soit 
avec  la  methode  de  van  Leer  soit  avec  la  methode 
d’Osher  suivant  la  nature  nonlineaire  du  sous  chemin 
considere.  Dans  la  suite,  nous  notons  VNL{^)  — 
$1  U  #3  et  LD{^)  =  4’2-  Le  flux  numerique  resultant 
de  I’operation  d’hybridation  trouve  alors  I’expression 
suivante 

/^^■^(ui.Ufi)  =  i^f(Ui)  -t-f(Uii) 

-  /  |Vf(u)|du 

Jld{*) 

-  f  (v/+(u)-vr(u))du). 

JvNL{*(Ui.,Ur)) 

Notons  que  par  construction,  le  flux  hy bride  coincide 
avec  le  flux  d’Osher  en  presence  d’une  discontinuite 
de  contact  seule  et  inversement  se  reduit  au  flux  de 
van  Leer  lorsque  seules  n’interviennent  que  des  ondes 
nonlineaires  dans  la  decomposition  en  ondes  approchee 
d’Osher-Solomon.  En  reprenant  les  notations  du  para- 
graphe  3.3,  la  relation  precedente  peut  etre  explicitee 
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en 

=  f^^{uL,UR) 

f  - (/"(“«)-/■«)) >  si  ^’*>0. 

+  (32) 

I  +(/'^(“fl)  -  /■^(“l))-  si“0" 

Soulignons  la  reelle  simplicite  du  flux  hybride  en  com- 
paraison  a  la  methode  d’Osher  originale.  En  partic- 
ulier,  les  points  soniques  n’interviennent  pas.  De  plus, 
I’unique  test  rentrant  dans  la  formulation  du  flux  hy¬ 
bride  peut  etre  automatiquement  pris  en  compte  en 
utilisant  les  proprietes  d-  --metrie  du  flux  de  van  Leer 
par  rapport  au  nombre  de  Mach.  Nous  avons  ainsi 

(“) 

-  /;..(-imii)),(34) 

+  (/i(-|M5|)  -  /i(-|MJ|)),  (36) 

fZV=flt  +  -  ftj-mme) 

et 

xHUS  _  fVL_ 

Jpvi  Jpvi 

sign{v*)ifXA-m\)  -  /+,  (-|3/£l))-  (37) 

Notons  qu’a  I’instar  du  schema  d’Osher-Solomon,  le 
flux  hybride  ne  permet  pas  de  satisfaire  a  un  principe 
du  maximum  sur  les  fractions  massiques  et  les  ener¬ 
gies  de  vibration  massique.  La  correction  proposee  par 
Larrouturou  [10]  peut  lui  etre  appliquee  sans  degrader 
ni  la  robustesse  ni  la  precision.  Soulignons  enfin  qu’il 
est  possible  de  donner  a  la  technique  d’hybridation  un 
cadre  beaucoup  plus  general  que  celui  expose  ici  [8]. 

3.6  Methode  du  deuxieme  ordre  explicite 

La  procedure  d’extension  de  la  methode  de  volumes 
finis  au  second  ordre  d ’approximation  en  espace  est  la 
methode  classique  MUSCL  de  van  Leer  qui,  a  chaque 
pas  de  temps,  repose  sur  une  reconstruction  affine  par 
morceaux  de  la  solution  approchee.  L’extension  de 
la  methode  MUSCL  que  nous  utilisons  au  cas  d’un 
melange  de  gaz  permet  de  respecter  la  conservation  des 
especes  elementaires  et  egalement  d’assurer  la  positiv- 
ite  des  fractions  massiques  et  des  energies  de  vibration 
massiques  sous  certaines  conditions  de  type  CFL  dans 
le  cas  d’un  schema  explicite  [13].  Dans  le  contexte  des 
maillages  curvilignes  qui  est  le  notre,  la  methode  est 
appliquee  direction  curviligne  par  direction  curviligne. 
La  methode  utilise  les  variables  suivantes: 

(  (5^a)l<a<n*)  P^  ^li  ^2^  Pi  nv  ),  (38) 

en  inhibant  la  procedure  de  reconstruction  sur  les  frac¬ 
tions  massiques.  Cette  strategie  permet  de  garantir  la 


conservation  des  especes  elementaires  qui  autrement 
serait  generalement  perdue  a  cause  des  nonlinearites 
inherentes  a  la  procedure  de  reconstruction.  La  fonc- 
tion  limitrice  consideree  est  la  fonction  proposee  par 
van  Albada  ou  la  fonction  minmod. 

4.  Methode  implicite 

La  construction  du  schema  implicite  est  obtenue  par 
une  linearisation  des  flux  numeriques  et  des  termes 
source.  L’implicitation  des  termes  de  flux  de  fluide 
parfait  est  basee  sur  la  methode  de  Flux  Vector  Split¬ 
ting  de  van  Leer,  et  ce  independamment  du  flux  ex¬ 
plicite  utilise.  La  robustesse  obtenue  est  a  priori  peu 
sensible  a  la  nature  du  schema  explicite.  Le  terme 
source  est  traite  de  maniere  centree.  Les  termes  de 
derivees  croisees  sont  negliges  dans  I’etape  de  lineari¬ 
sation  des  flux  de  fluide  visqueux.  Un  bon  traitement 
implicite  des  conditions  aux  limites  conditionnant  la 
qualite  d’acceleration  de  la  convergence  vers  I’etat  sta- 
tionnaire,  une  attention  particuliere  y  a  ete  apportee. 
L’operateur  implicite  ainsi  obtenu  est  lineaire  et  est 
resolu  par  une  methode  iterative.  Une  telle  methode 
presente  I’avantage  d’etre  bien  moins  sensible  au  choix 
du  pas  de  temps  que  ne  le  sont  les  methodes  par  fac¬ 
torisation  approchees.  Elies  evitent  egalement  la  de¬ 
composition  parfois  inadequate  de  la  matrice  jacobi- 
enne  des  termes  source.  La  methode  iterative  mise 
en  oeuvre  est  basee  sur  une  strategie  de  minimisation 
des  residus  telle  que  la  methode  GMRES.  La  methode 
iterative  convergeant  d’autant  mieux  que  le  systeme 
est  bien  conditionne,  une  factorisation  ILU  est  util- 
isee. 

5.  Resultats  numeriques 

Afin  d’illustrer  les  capacites  de  la  methode  pour  cal- 
culer  des  ecoulements  en  desequilibre  dans  des  con¬ 
figurations  varices,  des  calculs  d’ecoulements  externes 
autour  d’une  configuration  d’hyperboloide  plus  volet 
et  d’ecoulements  internes  dans  une  tuyere  qui  equipe 
la  soufflerie  hyperenthalpique  F4  de  I’ONERA  ont  ete 
realises. 

5.1  Ecoulement  dans  une  tuyere  de  la  soufflerie 
F4 

La  soufflerie  ONERA  F4  peut  etre  equipee  de  qua- 
tre  tuyeres  differentes  correspondant  chacune  a  des 
regimes  d’ecoulements  differents.  Les  conditions  des 
calculs  que  nous  presentons  ici  correspondent  au  cas 
test  numero  1  du  "Fourth  European  High  Velocity 
Database  Workshop”  (qui  s’est  tenu  le  24-25  Novembre 
1994  a  Noordwijk).  La  geometrie  est  celle  correspon- 
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dant  a  la  tuyere  n°2.Sa  longueur  totale  est  de  3.42 
m  et  le  rayon  du  col  est  de  0.005m.  Les  conditions 
dans  la  chambre  correspondent  a  une  enthalpie  totale 
reduite  de  260  et  une  pression  de  430  bar.  La 

temperature  de  la  paroi  est  de  300  K.  Elle  est  supposee 
totalement  catalytique  jusqu’a  une  distance  de  0.5  m 
en  aval  du  col,  puis  noncatalytique  apres. 

Le  domaine  de  calcul  a  ete  divise  en  huit  parties. 
Dans  le  premier  domaine,  I’ecoulement  dans  le  conver¬ 
gent  et  dans  la  region  proche  du  col  est  calcule.  Ces 
resultats  servent  ensuite  pour  determiner  la  solution 
dans  les  zones  suivantes  de  I’ecoulement  hypersonique. 
Chaque  domaine  comprend  89x85  points. 

Un  calcul  laminaire  et  un  calcul  turbulent,  respec- 
tivement  notes  (1)  et  (2)  ont  ete  realises.  Pour  le  cas 
turbulent,  le  modele  de  turbulence  utilise  est  le  mod- 
ye  algebrique  de  Baldwin-Lomax  et  le  point  de  tran¬ 
sition  est  situe  a  0.5  m  en  aval  du  col.  Les  resultats 
presentes  ont  ete  obtenus  apres  6000  iterations  dans 
le  premier  domaine.  Pour  les  autres  domaines,  600 
a  200  iterations  environ  suivant  le  domaine  considere 
ont  ete  necessaires.  Le  nombre  de  Courant  pent  at- 
teindre  une  valeur  de  500  (pas  de  temps  global)  dans 
les  derniers  domaines  du  divergent.  Dans  tous  les  cas, 
les  residus  maxima  decroissent  au  moins  de  10  ordres 
de  grandeur. 

La  figure  1  montre  I’ensemble  du  maillage  utilise  pour 
la  tuyere.  Les  resultats  pour  les  calculs  laminaire  et 
turbulent  sont  presentes  sur  les  figures  2  a  6.  Le  champ 
des  nombres  de  Mach  est  visualise  sur  la  figure  2  dans 
le  cas  de  I’ecoulement  laminaire  et  montre  une  onde 
venant  perturber  I’ecoulement  proche  de  I’axe  de  la 
tuyere.  La  naissance  de  cette  onde  correspond  a  un 
point  d’inflexion  de  la  geometrie.  Sur  la  figure  suiv- 
ante  sont  portees  les  distributions  de  temperatures  le 
long  de  I’axe  dans  le  cas  laminaire  ou  turbulent,  aucun 
effet  notable  de  la  prise  en  compte  de  la  turbulence  sur 
ces  distributions  ne  pouvant  etre  observe.  Les  distri¬ 
butions  transversales  de  nombres  de  Mach  pour  le  cas 
laminaire  et  le  cas  turbulent  sont  montrees  sur  la  figure 
4  en  sortie  de  tuyere.  On  observe  une  bonne  uniformite 
du  nombre  de  Mach  dans  le  noyau  de  I’ecoulement. 

5.2  Calculs  d’ecoulements  externes 
Deux  series  de  calculs  ont  ete  realisees  autour 
d’une  configuration  d’hyperboloide  plus  volet.  Cette 
geometrie  a  ete  proposee  pour  le  cas  test  numero  4  du 
Workshop.  La  longueur  totale  de  la  maquette  est  de 
0.1114  m  et  I’angle  entre  le  volet  et  I’axe  est  de  43.6 
degres.  Le  premier  calcul  correspond  aux  conditions 
de  I’ecoulement  dans  la  tuyere  n°  2  de  la  soufflerie  F4, 
I’enthalpie  totale  reduite  etant  egale  a  122  et  la  pres¬ 


sion  generatrice  etant  de  441  bar  (soit  les  conditions 
du  cas  test  numero  4): 

Too=187  K; 

T.,jv2=4078  K;  T„,o2=2485  K; 

/Joo=l-557  Uoo=3934  m/s;  T„,=300  K; 

Ca/-2=0.7254;  Co2=0.1354;  Cjvo=0.0895;  C;^=10-20; 
Co=0.0497. 

La  paroi  est  supposee  totalement  catalytique. 

Le  deuxieme  calcul  correspond  a  des  conditions  en  vol 
sur  une  geometrie  homothetique  de  la  precedente  dans 
un  rapport  1.4: 

Too=268  K; 

Poo=2.608  \Q~^Kg/rn?\  Uoo=5083  m/s;  T,„=:1000  K; 
Poo=201.5  Pa. 

La  paroi  est  egalement  supposee  totalement  cataly¬ 
tique. 

Le  meme  maillage  est  utilise  pour  les  deux  calculs 
qui  tiennent  compte  de  la  difference  d’echelle.  II  con- 
tient  au  total  401x110  points.  Trois  sous-domaines  ont 
ete  utilises  afin  de  diminuer  les  temps  de  calcul  et  la 
taille  necessaire  de  la  memoire.  Les  domaines  se  re- 
couvrent  sur  quatre  points.  Ces  domaines  (nez,  region 
intermediaire  et  region  du  volet)  comprennent  respec- 
tivement  80x110,  123x110  et  206x110  points.  Pour  le 
nez  et  la  region  intermediaire,  on  obtient  une  decrois- 
sance  des  residus  quadratiques  explicites  de  8  ordres  de 
grandeur  apres  2000  iterations.  Le  nombre  de  Courant 
atteint  10  pour  le  nez  et  70  pour  la  deuxieme  zone. 
Dans  la  region  du  volet,  une  decroissance  de  5  ordres 
de  grandeur  des  residus  est  obtenue  apres  20000  itera¬ 
tions  et  un  nombre  de  Courant  de  10.  Notons  que 
les  residus  n’atteignent  pas  de  plateau  et  continuent 
de  decroitre  lorsque  les  calculs  sont  poursuivis.  Cette 
convergence  lente  est  due  a  la  presence  d’une  impor- 
tante  zone  de  recirculation  dans  la  region  de  volet. 

Les  resultats  sont  presentes  sur  les  figures  5  a  17.  Les 
figures  5  a  9  montrent  des  courbes  d’isovaleurs  du  nom¬ 
bre  de  Mach  et  de  la  pression  pour  les  deux  calculs. 
Dans  la  region  du  volet,  la  zone  de  separation  est  bien 
definie  pour  les  deux  calculs  (figures  5  a  7).  La  fig¬ 
ure  6  montre  un  agrandissment  de  cette  zone  pour  le 
cas  du  vol.  Les  effets  visqueux  sont  importants  du  fait 
du  faible  rayon  du  nez.  Des  oscillations  legeres  sur  les 
courbes  d’isopression  sont  visibles  et  correspondent  a 
des  sauts  de  mailles  dans  la  region  du  choc.  La  pres¬ 
sion  atteint  la  valeur  maximale  de  22432  Pa  pour  le 
cas  en  soufflerie  et  63073  Pa  pour  le  cas  du  vol. 

L’ecoulement  est  relativement  fige  derriere  le  choc, 
comme  le  montrent  les  profils  de  temperature  obtenus 
pour  le  premier  calcul  (figure  10).  La  distance  du  choc 
est  dans  ce  cas  egale  a  3.7  10“*  m.  Les  figures  suivantes 
montrent  des  valeurs  a  la  paroi  pour  les  deux  calculs. 
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Le  nombre  de  Stanton  le  long  du  corps  est  presente 
sur  la  figure  11  pour  le  calcul  en  soufflerie.  La  valeur 
maximale  (0.284)  correspond  a  un  flux  de  chaleur  egal 
a  1.43  lO'^  W/m2.  Les  quatre  figures  suivantes  mon- 
trent,  dans  la  region  du  volet,  les  nombres  de  Stanton 
et  les  coefficients  de  frottement  pour  les  deux  calculs. 
La  zone  de  separation  mesure  environ  1.1  10  ^  m  pour 
le  cas  de  la  soufflerie  et  2  10“^  m  pour  le  cas  du  vol. 
Un  tourbillon  secondaire  peut  etre  observe  dans  le  cas 
du  vol  sur  ligne  charniere  avec  le  volet.  Enfin.  nous 
montrons  les  courbes  de  convergence  dans  la  region  in- 
termediaire  et  dans  celle  du  volet  (figures  16  et  17). 

6.  Conclusion 

Nous  avons  presente  les  methodes  numeriques  util- 
isees  dans  le  code  CELHYO  pour  le  calculs  des  ecoule- 
ments  en  desequilibre  thermique  et  chimique.  Ce  code 
permet  d’utiliser  les  schemas  de  Roe,  d’Osher,  de  van 
Leer  et  leur  hybridation.  L’accent  a  ete  mis  sur  la  tech¬ 
nique  de  decentrement  par  hybridation.  Cette  tech¬ 
nique  de  decentrement  combine  les  deux  approches 
classiques  de  maniere  a  n’en  retenir  que  les  pro- 
prietes  jugees  favorables  pour  la  simulation  numerique 
d’ecoulements  ou  coexistent  d’importants  phenomenes 
non  lineaires  et  lineaires.  C’est  en  particulier  le  cas 
des  ecoulements  visqueux  hyperenthalpiques  presen- 
tes.  L’operateur  implicite  est  construit  sur  les  flux 
de  van  Leer  et  est  inverse  par  une  methode  iterative 
de  type  GMRES.  Ce  code  permet  de  calculer  des  con¬ 
figurations  variees  bidimensionnelles  et  axisymetriques 
d’ecoulements  hyperenthalpiques,  et  ce  en  utilisant  des 
schemas  numeriques  precis  et  en  obtenant  une  bonne 
convergence. 
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Fig.  15  -  Distribution  des  coefficients  de  friction 
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Fig.  16  -  Courbe  de  decroissance  des  residus 
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SUMMARY 

In  this  paper,  an  implicit  projection 
methodology  for  the  solution  of  the  two- 
dimensional,  time  dependent,  incompressi¬ 
ble  Navier  -  Stokes  equations  is  pre¬ 
sented.  The  basic  principle  of  this  method 
is  that  the  evaluation  of  the  time  evolu¬ 
tion  is  split  into  intermediate  steps.  The 
computational  method  is  based  on  the  ap¬ 
proximate  factorization  technique.  The 
coupled  approach  is  used  to  link  the  equa¬ 
tions  of  motion  and  the  turbulence  model 
equations.  The  standard  k-e  turbulence 
model  is  used.  The  current  methodology, 
which  has  been  tested  extensively  for 
steady  problems,  is  now  applied  for  the 
numerical  simulation  of  unsteady  flows. 
Several  cases  were  tested,  such  as  plane 
or  axisymmetric  channels,  a  backward  fac¬ 
ing  step  and  a  flow  behind  a  square  cylin¬ 
der  . 

1.  INTRODUCTION 

The  numerical  prediction  of  unsteady  in¬ 
compressible  flowfields  has  always  been 
one  of  the  most  challenging  areas  of  fluid 
dynamics.  The  primary  difficulty  is  in 
finding  a  satisfactory  way  to  link  changes 
in  the  velocity  fields  to  changes  in  the 
pressure  field.  This  interaction  must  be 
accomplished  in  such  a  manner  as  to  ensure 
that  the  divergence  of  the  velocity  van¬ 
ishes  at  each  level  of  physical  time.  The 
most  common  solution  to  this  problem  is 
the  use  of  an  artificial  compressibility 
methodology  or  a  projection  methodology. 

The  projection  method  for  the  solution  of 
the  time-dependent  Navier-Stokes  equations 
was  introduced  independently  by  Chorin 
(Ref  1)  and  Temam  (Ref  2) .  Subsequently, 
an  explicit  version  of  the  method  was  pre¬ 
sented  by  Fortin  et  al  (Ref  3) .  The  pro¬ 
jection  method  is  an  interpretation  of  a 
fractional-step  method  as  adapted  to  the 
unsteady  Navier-Stokes  equations  (Ref  4). 

The  procedure  of  the  physical  time  level 
increment  is  split  into  two  steps.  Follow¬ 
ing  the  decomposition  of  Chorin  (Ref  1),  a 
tentative  velocity  field  is  first  calcu¬ 
lated  by  the  discretized  momentum  equa¬ 
tions  without  the  pressure  gradient.  At 
the  second  step,  the  velocity  components 
at  the  new  time  level  are  evaluated  by 
correcting  the  tentative  solution  in  order 
to  satisfy  the  incompressibility  con¬ 
straint  . 

The  solution  algorithm  we  use  in  the  pres¬ 
ent  study,  is  the  approximate  factoriza¬ 
tion  technique.  This  is  an  implicit  algo¬ 


rithm  which  was  initially  developed  by 
Beam  and  Warming  (Ref  5)  for  compressible 
flows  but  has  successfully  used  for  incom¬ 
pressible  steady  flows  as  well  (Ref  6,  7). 
Regarding  the  mathematical  model,  a  pro¬ 
jection  method  is  developed,  which  uses  a 
Poisson  equation  for  the  explicit  pressure 
derivation,  while  the  numerical  algorithm 
involves  only  the  momentum  equations. 

Concerning  the  turbulence  model  there  are 
plenty  of  options.  The  standard  k-c  model 
with  the  wall  functions  equations  (Ref  8) 
was  selected  because  it  is  well  tested  and 
widely  used,  in  spite  of  its  disadvan¬ 
tages.  In  addition,  small  values  of  the  y- 
plus  are  not  required,  so  coarse  grids  can 
be  used  near  the  walls  and  thus  large  time 
steps  are  possible.  It  is  expected,  that 
this  turbulence  model  will  sometimes  per¬ 
form  poorly,  especially  in  the  recircula¬ 
tion  zones. 

The  objective  of  this  paper  is  to  describe 
a  new  projection  methodology  developed  for 
collocated  grids  and  to  present  predic¬ 
tions  for  several  test  cases  where  the  un¬ 
steadiness  is  either  forced  or  inherent. 

2.  THE  GOVERNING  EQUATIONS 

The  full  form  of  the  momentum  equations  is 
used,  where  all  variables  are  in  non- 
dimensional  form.  Concerning  the  turbulent 
flows  the  high-Reynolds  number  (Ref  8) 
form  of  the  k-e  model  is  used. 

This  formulation  requires  the  use  of  the 
wall  functions  to  bridge  the  viscous  and 
boundary  layers  in  proximity  to  the  solid 
wall.  This  approach  is  strictly  valid 
only  for  attached  shear  layers  and  may 
perform  poorly  in  the  recirculation  zones. 
In  addition  this  model  is  valid  under  the 
hypothesis  of  equilibrium  and  may  not  sat¬ 
isfactory  perform  in  unsteady  flows. 

On  the  other  hand,  experimental  observa¬ 
tions  showed  that  the  general  behaviour  of 
the  boundary  layer  and  the  structure  of 
the  turbulence  are  not  fundamentally  af¬ 
fected  by  the  unsteadiness  of  the  flow 
(Ref  9,  10,  11).  From  these  observations 
it  is  well  founded  to  suppose  that  the  hy¬ 
potheses  used  in  calculations  methods  for 
the  steady  case  are  still  valid  for  the 
unsteady  case. 

The  reference  quantities  are  some  refer¬ 
ence  velocity  u^^f,  a  reference  length  Lreti 
a  reference  density  p^^j  and  a  reference 
kinematic  viscosity  Vref  The  reference 
value  for  the  time  is  defined  as  tj.gf= 
Lref/Uref  and  for  the  pressure  is  the  prod- 
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uct  Pref=PrefU^ref  •  The  reference  quantity 
for  the  turbulent  kinetic  energy  is  u 
and  for  the  dissipation  rate  u 

Performing  a  generalised  coordinates' 
transformation  from  the  physical  (x,y,t) 
to  the  computational  (?,T|,  x)  domain,  the 
following  non-dimensional  form  of  the 
equations  is  obtained  (Ref  12) : 

a  o  -I-  a, F  a„G  -H  oE  +  K  =  a.v  -h  a„w  +  ac  +  d 

where  is  a=0  for  the  two  dimensional  equa¬ 
tions,  a=l  for  the  axisymmetric  equations 
and  the  subscripts  x,y,^,T|,T  denote  deri¬ 
vation.  For  convenience  we  express  the 
above  equation  in  the  following  form: 

5,Q  -(-  K  =  [f(u,  v)]  (1) 

where 

[f(u,  v)]  =  a^v  +  a^w  +  ac  +  D  -  a^F  -  a^G  -  as 

In  equation  (1),  Q  is  the  vector  of  the 
conservative  variables: 


is  a  matrix  that  contains  the  pressure  de¬ 
rivatives  of  the  momentum  equations. 

In  the  expressions  above,  ^,t|  are  the  cur¬ 
vilinear  coordinates,  connected  to  the 
cartesian  ones  x,y  through  the  generalised 
coordinates'  transformation: 

^  y,  t)  ,  T]  =  Ti(x,  y,  t)  ,  T  =  t 

and  J  is  the  Jacobian  of  the  transforma¬ 
tion  : 

J  = 

In  addition,  U,  V  are  the  contravariant 
velocities  along  the  ^,T1  directions  re¬ 
spectively,  given  by  the  following  rela¬ 
tions  : 

^  ^^v  ,  V  =  +  q^u  H-  q^v 

Re  is  the  Reynolds  number  and  g  is  the  ki¬ 
netic  energy  production  term: 

G  =  2[(uJ'  +  (vj'j  +  (u^  +  V,)' 

The  stresses  are: 


F,G,E  are  the  convective  fluxes: 

1  r  2  2  T 

F  =  -  u  U+  -  t  k,  V  U+  -  E  k,  kU,zU 

J  L  3  3  '  J 

1  r  2  2  T 

G  =  —  uK-l-— qk,  V  FH —  r\  k,  k  V,  z  V 
JL  3  3  '''  J 

V  f  IT 

E  =  —  [u,  V,  k,  s 
Jy 

V,W, C  are  the  viscous  fluxes: 


1  v" 

C  -  r,k^  +  2v,  — 

J  Re  y  Y 


(r.s, +2v,c.i:^J 


D  is  a  vector  that  contains  the  source 
terms  of  the  k  and  e  equations: 


1  r  z^ 

D  =  -  0,0,  C  —  G  -  8,  C^C^kG  -  C,  — 


and,  finally 

r  r 


' 

5x[-] 

\PJ 

1 

J 

IpJ 

0 

y 

<  0  , 

“■xx  -  ■^yeff'^x  / 
-C.,,  =  =  V.„ 


=  2Vg„v^  , 


2v,£fV  /  y 


K  + 


^x'txx  +  ^y'txy  |  [  ’Ix'^xx  +  ^ly^xy 

1  4x''xy  +  ^y'^yy  1  ’Ix'^xy  +  ’ly'^yy 

“  J  Re  +^y^y)  ”  “  J  Re  r,(q^k^  qj,kJ 

+  4^8  JJ  lr,(q,E,  +  q^sj^ 


where  Vg^f  is  the  effective  viscosity. 
Finally,  for  the  turbulence  model  equa¬ 
tions  are: 


V.  V, 

K  =  ^2  +—  .  L  =  ^2  +  — 


where  Vy  is  the  kinematic  viscosity  and 
is  the  turbulent  viscosity,  which  is  given 
by  the  relation: 


The  constants  are : 

=  0.09,  Cj  =  1.44,  Cj  =  1.9  2,  cr^  =  1.0,  Cf^  =  1.3 

For  the  above  model  the  concept  of  wall 
functions  has  been  employed.  The  central 
idea  is  that  the  flow  in  the  region  near 
the  wall  can  be  assumed  to  behave  as  an 
one-dimensional  Couette  flow.  This  is  a 
reasonable  assumption  except  for  regions 
of  high  pressure  gradient,  separation  or 
reattachment.  Once  this  assumption  is 
made,  it  is  rather  easy  to  arrive  at  exact 
or  semi-empirical  relations  (Ref  8,  14), 
which  link  the  shear  stresses  and  the 
other  variables  at  the  wall  to  the  values 
of  velocity,  turbulence  energy,  etc.  at 
the  outer  edge  of  the  Couette  layer,  where 
the  first  interior  grid  point  is  located. 

3.  NUMERICAL  ALGORITHM 

The  time  marching  scheme 

For  the  solution  of  the  system  of  equation 
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(1)  the  implicit,  factored,  finite  differ¬ 
ence  scheme  of  Beam  and  Warming  (Ref  5)  is 
used.  The  temporal  derivative  in  equation 
(1)  is  approximated  via  a  generalized  time 
differencing: 


scheme.  Equation  (3)  is  actually  the  same 
with  (2),  except  that  it  contains  equation 
(1)  without  the  pressure  gradients,  and  is 

AQ''=Q*-Q" 


„  1  (l  -I-  C)a  -  C  V  „ 

At  1  -I-  9  a 

which  takes  the  form: 

e  .  1-0 


fe  -  -  -  C 

\ 

At  -I-  At" 

Lv  2  ^ 

/ 

AQ" 

At 


1  +  C 

1  +  C 


+  0 


l  +  Q  ' 


a,Q" 


0  -  -  -  CIAt  +  At" 


(2) 


A  non-linear  expression,  eq .  (3),  occurs 

for  the  time  increment  of  the  conserva¬ 
tives  variables'  vector  AQ"  (Ref  12,  14). 
In  order  to  derive  a  linear  algebraic  sys¬ 
tem  of  equations,  a  linearization  of  vis¬ 
cous  and  inviscid  fluxes  must  be  per¬ 
formed.  The  inviscid  fluxes,  which  are 
functions  of  Q,  are  linearized  using  a 
Taylor  series  expansion,  for  example: 

AF"  =  A"  ■  AQ"  -t-  o(At") 


where  A  and  V  are  the  forward  and  back¬ 
ward  differencing  operators,  respectively, 
the  superscript  n  denotes  the  time  instant 
and  0  denotes  the  order  of  the  truncation 
error . 

After  substituting  (1)  into  (2)  and  per¬ 
forming  calculations  the  following  rela¬ 
tion  is  derived: 


At 

9 

1+C 


A.^[F(u,v)r +  i^if<u,v)r 


1-0  Q  AQ"-' 

K"  -H  - — 


1+C 


l  +  C,  At 


+  0 

[0  -  -  -  C 

,]at  +  At" 

l\  2 

/  J 

Using  a  fractional  step  method  similar  to 
that  described  by  Anderson  and  Kristoffer- 
sen  (Ref  13)  the  above  relation  is  split 
in  two  parts: 


where  A'‘=9f'’/5q''  is  the  Jacobian  matrix  of 
the  vector  F” . 

The  above  linearization  of  the  inviscid 
fluxes  ensures  the  second  order  time  accu¬ 
racy  of  the  scheme.  In  order  that  this  ac¬ 
curacy  is  retained  in  the  corresponding 
linearization  of  the  viscous  fluxes,  it 
must  be  taken  into  account  that  the  latter 
are  functions  of  all  Q,Q^,Q,^,  for  example: 

v'‘(Q,Q5,Q^)=V"  {Q,Q5)+V"  (Q,Qn) 

The  linearization  of  matrix  Vj  leads  to 
the  following  relation: 

AV"  =  -{-P"  -I-  R^)  AQ"  +  (r"  AQ")^  -I-  o(At") 

while  the  matrix  V^"  is  treated  in  a  ex¬ 
plicit  way: 


Q'  -  Q"  ^ 
At 

C  aq"-^ 

1  +  C  At 


0 


1+C 

+  0 


[ftu. 


1  +  C 


(q--  -  C 

]at  +  At" 

2 

/  J 

v)] 


(3) 


AVj"  =  AVj"”'  +  o(At") 

where  P"  and  r"  are  the  Jacobian  matrices. 
A  detailed  description  for  all  the  line¬ 
arizations  is  given  in  Ref  14. 


and 
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At 

9 

1  +  C 


1  +  C 

^  "i+c^ 


1+C 


[ftu,' 


where  Q  is  an  intermediate,  or  tentative, 
flowfield.  Using  equation  (1)  the  above 
relation  is  written  in  the  form: 


0 


At 

0 

1+C 


0 


1  +  C 
1-0  „ 
^  -TTc^ 


1  +  C 


The  substitution  of  the  linear  expressions 
of  the  flux  vectors  into  the  original  non¬ 
linear  equation  for  AQ”,  leads  to  a 
strongly  coupled  system  of  equations  in 
both  spatial  directions.  This  coupled  sys¬ 
tem  is  solved  by  the  Approximate  Factori¬ 
zation  Technique  (Ref  5,  14),  which  leads 
to  the  following  two  tridiagonal  systems, 
one  for  each  of  the  two  directions  5,11: 


and  after  some  simple  calculations  and  as¬ 
suming  k"^^=K  is  obtained: 


•  AQ”  =  AQ" 


(5b) 


^  K-n+l  _  (4 

At  l+C-0  1  +  ^-  0 

Equation  (4)  imposes  the  condition  1+^- 
9^0.  Thus  we  use  6=1  and  ^=0.5  which  leads 
to  the  second  order  three  point  backward 


where  A, B, P, Y, R, S, Nj,  Nj,  N3, T  and  H  are  Ja¬ 
cobian  matrices  (Ref  14),  and 

Q'  =  Q"  +  J  •  AQ"  (5c) 

R.  H.  S.  = 
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— ^  fsJ-F  -I-  V)”  -I-  dJ-G  +  W)"  +  a(C  -  E)"  -I-  D"] 
1  ^  i  ?  n  J 

0  Ax  r  ,  ,1 


-)-  AQ"''  -h  D,  -I-  0  [^0  -  -  cJax"  -I-  Ax' 


(5d) 


where  Q  =  JQ  is  the  vector  of  conserva¬ 
tive  variables  in  the  physical  domain, 
is  the  artificial  dissipation  terms  (Ref 
14),  and  0^  are  weighting  functions 
(Ref  15)  used  to  add  the  Jacobian  matrix  H 
in  both  the  sweeps . 

The  Poisson  eguation 

Equation  (4)  leads  to  the  following  rela¬ 
tions  (9=1)  : 


u 


n+ 1 


u'  -  Ax 
v*  -  Ax 


0 


1  +  ^-0 

0 

1  +  C  -  0 


k  ,  e”  =  e 


(6a) 

{6b) 

(6c) 


Assuming  that  the  continuity  equation  is 
satisfied  at  the  n+1  time  instant: 


V  •  5“'  =  0 


the  first  two  of  (6)  are  combined  to  give 
the  Poisson  equation: 


1  -h  ^  -  e 

6  Ax 


V  •  5' 


Ax 


a.Ax .  a|-J 


-f-  5yAx  • 


(7) 


oscillations  from  the  solution  are  re¬ 
moved.  In  the  present  work  only  explicit 
terms  Dg  are  used  in  (5).  These  terms  are 
a  blended  second  and  fourth  order  non¬ 
linear  model  which  is  widely  used  in  com¬ 
pressible  flows  (Ref  16,  17,  18,  19)  and 

was  used  for  the  first  time  in  incom¬ 
pressible  flows  by  Pentaris  et  al  (Ref 
14),  where  is  proved  that  the  existence  of 
the  second  order  dissipation  terms  do  not 
affect  the  spatial  accuracy  of  the  method. 

The  definition  of  the  time  step 


Although  the  solution  method  is  implicit, 
the  actual  stability  of  the  scheme  is  not 
independent  of  the  time  step  used.  In  this 
work  small  time  steps  are  used  which  help 
the  fast  convergence  of  the  Poisson  equa¬ 
tion.  When  a  problem  with  oscillating  flow 
rate  is  to  be  simulated,  the  Navier-Stokes 
equations  must  be  integrated  for  as  many 
cycles  as  are  needed  to  reach  a  periodic 
steady  state,  if  such  a  state  exists.  In 
the  periodic  steady  state,  of  period  T, 
the  solutions  at  time  instants  t  and  t+T 
must  reach  a  specified  convergence  crite¬ 
rion,  which  in  the  present  work  is  1x10  . 

With  the  present  method  this  criterion  is 
reached  at  the  second  period,  because 
10000  time  intervals  are  used  per  period. 
Using  less  time  intervals  per  period,  more 
iterations  are  needed  for  the  convergence 
of  the  Poisson  equation.  In  addition  more 
periods  are  necessary  to  reach  the  above 
criterion  and  thus  the  total  computational 
cost  is  increased. 

When  a  problem  with  steady  upstream  condi¬ 
tions  is  solved,  where  the  Poisson  equa¬ 
tion  is  rapidly  converged,  the  time  step 
is  essential  to  be  as  large  as  possible. 
Then  the  time  step  is  defined  as: 


where  c  =  (u,  v)  is  the  velocity  vector. 

The  procedure  that  is  used  is  the  follow¬ 
ing.  First  the  time-marching  scheme  of 
(5)  is  solved  to  provide  the  tentative  ve¬ 
locity  components  u  ,  v  and  the  turbulent 
variables  k,E.  Next  the  Poisson  equation 
(7)  is  solved  using  the  classic  ADI  method 
and  the  pressure  field  is  obtained.  Fi¬ 
nally  the  velocity  components  at  the  new 
time  level  are  evaluated  by  correcting  the 
tentative  velocity  field  using  (6a)  and 
(6b).  It  is  essential,  for  unsteady  flows, 
to  fully  converge  the  Poisson  equation  at 
each  time  step  in  order  the  mass  conserva¬ 
tion  to  be  satisfied. 

The  artificial  dissipation  terms 

The  spatial  derivatives  in  the  above  sys¬ 
tem  of  equations  are  approximated  by  three 
point  central  second  order  differencing 
expressions.  So  the  solution  of  the  system 
of  equations  (5)  requires  the  inversion  of 
two  block  tridiagonal  systems,  one  in  each 
direction.  On  the  other  hand,  the  use  of 
central  differences  on  collocated  grids 
leads  to  the  necessity  of  adding  external 
artificial  dissipation  terms,  so  that  the 
stability  is  retained  and  high  frequency 


1  -t- 

where  is  the  maximum  of  all  the  Jaco- 

bians  in  the  computational  domain  and  CFL 
is  the  Courant  number. 

4.  BOUNDARY  CONDITIONS 

The  use  of  a  collocated  grid  allows  the 
impose  of  the  suitable  boundary  conditions 
in  convenient  form.  Throughout  the  compu¬ 
tations,  explicit  boundary  conditions  are 
used.  For  the  Poisson  equation  these  con¬ 
ditions  are  derived  by  integrating  equa¬ 
tion  (7)  over  the  solution  domain  and  ap¬ 
plying  the  Gauss's  theorem  (Ref  20): 

r^fpT^'  -  i-i-C-0f--  - 

;  IpJ  0Ax  1 

where  the  last  part  of  equation  (7)  vanish 
for  unsteady  flows  because  is  AT=const  in 
the  entire  domain.  In  the  equation  above, 
n  is  the  outward  unit  vector  normal  to  the 
boundary  A  which  encloses  the  solution  do¬ 
main  . 

Concerning  the  other  variables,  the  veloc- 
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ity  profiles  specified  in  the  in¬ 

let  boundary  ,  while  the  kinetic  energy 
k^n  and  dissipation  rate  are  given  by 
the  following  relations; 


k,.  =  0.003  u‘ 


C..  k" 


0.005  D,. 


(8) 


where  is  the  inlet  span. 


On  the  outlet  boundary  all  variables  are 
calculated  by  extrapolation  from  the  inte¬ 
rior.  At  the  symmetry  axis  the  first  de¬ 
rivatives  of  all  variables  are  set  equal 
to  zero,  except  the  v-component  of  the  ve¬ 
locity  which  is  set  equal  to  zero.  On  the 
solid  surface  the  non-slip  condition  is 
applied  for  the  velocity  components.  The 
kinetic  energy  and  the  dissipation  rate 
are  defined  at  the  first  grid  point  above 
the  solid  surface  with  the  use  of  the  wall 
functions  (Ref  14) . 


Finally,  as  initial  conditions,  the  u  ve¬ 
locity  component  is  set  equal  to  unity, 
while  the  v  velocity  component  and  the 
pressure  vanish.  The  initial  data  for  the 
turbulence  model  variables  are  given  by 
equations  ( 8 ) . 


for  the  pressure  are  given  in  Ref  21. 

These  solutions  show  that  the  velocity  is 
a  function  of  time  only.  This  is  a  direct 
reflection  of  the  incompressible  continu¬ 
ity  equation  in  a  constant  area  tube.  The 
pressure  fluctuation  is  a  linear  function 
of  X  that  vanishes  at  x=l  to  meet  the 
downstream  boundary  condition.  Some  com¬ 
parisons  between  numerical  results  and  the 
analytic  solution  are  shown  in  Fig  1.  The 
calculated  dimensionless  velocity  as  a 
function  of  time,  and  the  dimensionless 
pressure  at  three  longitudinal  positions 
of  the  tube  are  compared  to  the  analytic 
solution.  Both  the  numerical  results  are 
in  excellent  agreement  with  the  analytic 
solution,  demonstrating  the  reliability  of 
the  present  method  for  unsteady  flows. 

Two-dimensional  periodic  flow  between  par¬ 
allel  plates 

The  oscillatory  flow  between  two  parallel 
plates  with  a  span  of  2b  is  the  second 
test  case  we  present.  The  Reynolds  number 
is  based  on  the  half  distance  b  between 
the  two  plates  and  the  maximum  inflow  ve¬ 
locity  Uo-  At  x=0  the  imposed  inflow  uni¬ 
form  velocity  is  given  by: 


5.  RESULTS  AND  VALIDATION 


u  (t )  =l-sin  (Str-t )  ,  v{t)=0 


Some  representative  results  of  several 
test  cases  are  shown  in  this  section.  It 
must  be  mentioned  that  all  the  quantities 
used  are  dimensionless.  The  dimensionless 
numbers  Reynolds,  Strouhal  and  Womersley 
are  defined  as: 


The  analytic  solution  for  the  velocity  and 
the  pressure  gradient  for  the  developed 
part  of  the  channel,  is  given  by  Moore 
(Ref  22).  The  Strouhal  number  is  equal  to 
10  and  the  Reynolds  number  is  equal  to 
1.6. 


respectively,  where  u^ef  the  reference 
cyclic  frequency. 


A  75x29  grid  is  used  for  the  current  test 
case,  with  4b  length  and  lb  height.  The 
lower  boundary  is  a  solid  wall  and  the  up¬ 
per  one  is  a  symmetry  axis. 

One  cycle  of  the  inflow  velocity  oscilla¬ 
tion  is  split  in  10000  time  intervals  and 
the  dimensionless  time  step  obtained  is: 


Finally  it  must  be  noted  that  all  the  re¬ 
sults  have  been  tested  for  various  grids 
and  are  independent  from  the  grid  density. 

One-dimensional  oscillatory  flow 

In  order  to  check  the  reliability  of  the 
present  method  it  was  initially  developed 
for  one-dimensional  flows  and  it  was 
tested  to  an  oscillatory  channel  flow  (Ref 
21) .  In  this  problem  the  back  pressure  of 
the  channel  is  oscillating  according  to: 

Pex(t)  =  p„  4-  p^sin(str  •  t) 

An  analytic  solution  to  this  problem  can 
only  be  obtained  if  the  pressure  perturba¬ 
tion  Pg  is  small  compared  to  the  mean  back 
pressure  p^.  In  this  work  these  parameters 
are  Pg=0 . 1  and  Pc,=  l.  The  Strouhal  number, 
based  on  the  time  mean  inflow  velocity  u^ 
and  the  channel  length  J,  Str=w,,gj. J/u^  is 
chosen  to  be  equal  to  10 . 

The  analytic  solution  for  the  velocity  and 


dt 


2k 

Str  • 10000 


2%  ■  10"" 


In  Fig  2  the  developed  velocity  profiles 
at  different  physical  time  instants  are 
presented.  As  can  be  seen  the  numerical 
results  coincide  with  the  analytic  solu¬ 
tion.  In  Fig  3  the  velocity  as  a  function 
of  time  at  three  different  distances  from 
the  wall,  and  the  pressure  gradient  in  the 
developed  part  as  a  function  of  time  are 
presented.  The  agreement  is  excellent  com¬ 
paring  the  numerical  results  with  the  ana¬ 
lytic  solution.  It  is  clear  that  the  un¬ 
steady  motion  is  predicted  well  after  the 
one  fourth  of  the  first  period,  and  this 
is  one  reason  for  the  use  of  small  time 
steps. 

Periodic  flow  in  axisymmetric  channel 

The  third  test  case  under  consideration  is 
the  periodic  Stokes  flow  in  a  circular 
tube,  extensively  presented  and  analysed 
by  many  researchers  (Ref  23,  24,  25,  26). 
In  the  present  paper  the  Reynolds  number. 
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based  on  the  radius  a  of  the  tube  and  the 
maximum  inflow  velocity  u^,  is  considered 
to  be  equal  to  0.1,  in  order  to  approxi¬ 
mate  the  Stokes  flow.  At  x=0  the  imposed 
velocity  profile  is  (Ref  26): 

u  (t )  =u  (y) -cos  (Str’t )  ,  v(t)=0 

where  u (y)  is  equal  to  unity  except  the 
near  the  wall  region  were  parabolically 
approaches  zero.  For  the  present  case  we 
select  the  typical  Womersley  number  of 
W=aV(Uref/Vref )  =  VsO  and  the  Strouhal  number 
becomes  Str=aUref/Uo=300 .  The  time  step 
used  is  2.09  4-10  ^. 

A  45x40  grid  is  used,  with  1.2a  length 
and  la  height.  The  lower  boundary  is  a 
solid  wall  and  the  upper  one  is  a  symmetry 
axis.  Solution  for  the  above  relations  are 
given  by  Goldberg  et  al  (Ref  26),  in  their 
Table  I. 

In  Fig  4  the  comparisons  between  the  semi- 
analytic  solution  and  the  numerical  re¬ 
sults  provided  by  the  current  method  are 
given,  for  the  u-velocity  component,  at 
four  instants  of  the  physical  time.  The 
agreement  of  the  current  numerical  results 
with  the  semi-analytic  solution  is  very 
good  at  all  the  time  instants.  The  dis¬ 
crepancies  that  occur  at  centreline  veloc¬ 
ity  at  ut=0  and  mt=n  due  to  the  semi- 
analytic  solution  (Ref  26). 

The  main  reason  that  this  test  case  is  ex¬ 
amined,  is  that  the  results  provided  by 
the  analytic  solution  concern  the  entire 
flowfield  along  the  tube,  in  contrast  to 
the  flow  between  the  two  parallel  plates 
where  results  only  for  the  developed  part 
of  the  flow  were  available.  In  addition 
the  Strouhal  number  is  much  larger  than  it 
was  in  the  previous  test  case. 

Unsteady  flow  behind  a  square  cylinder 

The  unsteady  flow  behind  a  square  cylinder 
is  presented  in  this  paragraph.  The  objec¬ 
tive  is  to  examine  the  reliability  of  the 
methodology  when  the  unsteadiness  of  the 
flow  is  due  to  the  viscosity  of  the  flow 
and  not  to  an  external  cause. 

The  Reynolds  numbers  examined,  based  on 
the  inflow  uniform  velocity  u^  and  the 
square  side  a,  are  100,  250,  500  and  750. 
Three  different  grids  were  used  with 
100x56,  200x110  and  145x111  points.  The 
200x110  grid  is  shown  in  Fig  5.  The  points 
inside  the  square  are  blocked.  The  posi¬ 
tion  of  the  cylinder  and  of  all  the 
boundaries  are  those  shown  in  Fig  5,  and 
are  the  same  for  all  the  grids.  The  upper 
and  lower  boundaries  are  considered  to  be 
symmetry  axes. 

Indicative  experimental  studies  concerning 
this  flow  are  those  of  Purtell  and  Kle- 
banoff  (Ref  27)  and  Okajima  (Ref  28). 
Typical  numerical  studies  are  those  of 
Davis  and  Moore  (Ref  29),  Franke  et  al 
(Ref  30)  and  Kelkar  and  Patankar  (Ref  31). 


In  Fig  6  the  Strouhal  numbers  Str=fa/Uo 
predicted  for  all  the  grids  and  for  sev¬ 
eral  time  steps  are  shown.  Comparisons  are 
made  to  other  experimental  data  and  nu¬ 
merical  results.  The  agreement  is  very 
good.  It  can  be  seen  that  the  results  are 
slightly  affected  by  the  grid  density  or 
the  time  step  used.  On  the  other  hand,  the 
disagreement  between  the  experimental  data 
presented  in  Fig  6  show  the  uncertainty 
and  the  sensitivity  of  the  flow. 

In  Fig  7  the  vorticity  isolines  are  pre¬ 
sented  for  Reynolds  numbers  100  and  250. 

In  Fig  8  the  time  history  of  the  v- 
velocity  behind  the  cylinder  and  the  cor¬ 
responding  power  spectrum  are  presented. 

It  must  be  mentioned  that  for  Reynolds 
numbers  100  and  250  the  flow  is  periodic. 
For  larger  Reynolds  numbers  the  flow  be¬ 
comes  transitional  or  turbulent,  and  the 
time  histories  of  the  velocity  and  the 
pressure  show  a  chaotic  behaviour. 

Unsteady  turbulent  flow  behind  a  backward- 
facing  step 

In  the  present  paper  a  numerical  investi¬ 
gation  of  the  coherent  vortices  in  turbu¬ 
lence  behind  a  backward-facing  (Ref  32) 
step  is  presented.  The  ratio  of  the  chan¬ 
nel  height  W  to  the  step  height  H  is  2.5. 
The  geometry  and  the  inflow  velocity  pro¬ 
file  U (y)  are  the  same  as  in  the  experi¬ 
ments  of  Eaton  and  Johnston  (Ref  33) .  A 
250x50  grid  is  used,  a  detail  of  which  is 
shown  in  Fig  9.  The  total  length  of  the 
channel  is  50  step  heights.  Both  the  lower 
and  the  upper  boundaries  are  solid  sur¬ 
faces.  The  Reynolds  number  based  upon  the 
step  height  H  and  the  maximum  inflow  ve¬ 
locity  Uq  is  38000.  The  time  step  used  is 
0.0075. 

In  the  first  run  the  original  k-c  model 
was  used.  The  flow  that  occurred  was 
steady.  The  recirculation  length  was  7.1H. 
The  main  reason  that  a  steady  flow  was 
predicted,  is  the  overestimate  of  the  tur¬ 
bulent  viscosity,  which  indirectly  reduces 
the  Reynolds  number.  Thus  a  second  run  was 
performed  using  a  modified  relation  for 
the  turbulent  viscosity: 

k^ 


where 

4  =  -^0+  {1  -  -fo)  {1  -  exp  [-(y"  -  yl)  /  A"]} 

is  a  function  proposed  by  Miner  et  al  (Ref 
34)  in  order  to  reduce  the  turbulent  vis¬ 
cosity  near  the  wall.  The  constants  are 
=  0.04  r  y*  =  8  and  A'"=26. 

Using  the  above  modification  the  flow  be¬ 
comes  unsteady.  The  pressure  contours  and 
the  vorticity  contours  are  shown  in  Fig 
10.  The  presence  of  a  mixing  layer  behind 
the  step  is  clear.  The  recirculation 
length  (temporal  mean)  is  overestimated 
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and  is  8.1H,  versus  the  experimental  re¬ 
sult  of  7.8H  and  the  other  numerical  re¬ 
sult  of  Silveira  Neto  et  al  (Ref  32)  of 
6.8H.  The  eddies  which  impinge  on  the 
lower  wall,  and  are  transported  down¬ 
stream,  are  shed  with  a  frequency  f  that 
corresponds  to  a  Strouhal  number  Str=fH/Uo 
=0.068.  This  is  in  excellent  agreement 
with  the  experimental  data,  where 
StrsO.07. 

In  Fig  11  the  time  mean  velocity  profiles 
at  two  different  positions  are  shown,  in 
comparison  to  the  experimental  data  of  Ea¬ 
ton  and  Johnston  and  the  numerical  results 
of  Silveira  Neto  et  al.  The  agreement  of 
the  results  provided  with  the  experimental 
data  is  very  good.  At  Fig  12  the  time  mean 
kinetic  energy  profiles  are  compared  to 
the  experimental  data.  The  agreement  is 
very  good.  In  both  the  Fig  11  and  12  the 
results  of  the  steady  case  are  also  shown. 
In  Fig  13  the  temporal  evolution  of  the 
longitudinal  velocity  component  at 
x/H=7.59,  y/H=0.1  and  the  corresponding 
spectrum  analysis  are  shown. 

An  interesting  phenomenon,  that  can  be  ob¬ 
served  in  Fig  10  is  the  separation  of  the 
boundary  layer  from  the  upper  wall;  it 
generates  a  second  street  of  coherent  vor¬ 
tices  which  are  transported  toward  the 
outlet  of  the  channel  with  a  Strouhal  num¬ 
ber  Str=0.068.  This  phenomenon  has  also 
been  observed  in  experiments  performed  by 
Armaly  et  al  (Ref  35)  with  StrsO.07. 

6.  CONCLUSIONS 

An  implicit  projection  methodology  for  the 
solution  of  the  unsteady  Navier-Stokes 
equations  in  collocated  grids  is  presented 
in  this  paper.  The  computational  method  is 
based  on  the  approximate  factorization 
technique  and  the  incompressibility  con¬ 
straint  is  satisfied  by  a  Poisson  equa¬ 
tion.  Extended  comparisons  with  analytic 
solutions,  experimental  data  and  numerical 
results  provided  by  other  researchers  lead 
to  the  conclusion  that  the  present  method¬ 
ology  is  a  reliable  tool  for  solving  a 
large  range  of  unsteady  problems. 
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Figure  1.  Time  evolution  of  velocity  (left)  and  pressure  (right)  in  the  one-dimensional 
flow.  Comparison  with  analytic  solution. 


ooooo  Analytic  solution,  i)t=0 
oDDoo  Analytic  solution,  cjt=7r/4 
-  Current  method 


ooooo  Analytic  solution,  o;t=7r 
□  □□□□  Analytic  solution,  &)t=57r/4 
-  Current  method 


— 1-00  I  '  'll  I  I  I  I  I  I  I  I  I  1  I  I  I  I  I  I  I  I  I  I  I  I  I  I  —1.00  I  I  I  I  I  I  I  I  I  I  I  I  1  I  ly  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I 
-1.50  -1.00  -0.50  0.00  0.50  1.00  1.50  -1.50  -1.00  -0.50  0.00  0.50  1.00  1.50 

U/Uref  U/Uref 


ooooo  Analytic  solution,  ojt=7r/2 
oDDoo  Analytic  solution,  (at=3rr/4 
-  Current  method 


ooooo  Analytic  solution,  ut=37r/2 
□  □ooo  Analytic  solution,  u\.=7tt/A 
-  Current  method 


1  -00  I  i  I  I  I  I  I  r  *  I  I  I  I  \  t  ^1  I  I  I  I  I  I  I  I  r~i  1  I  p  I  1 .00  “1  I  I  I  I  r  r'  I  I  I  I  I  I  t  I  I  I  [  I  I — I — I  I  [  I — rn — i — i 

-1,50  -1.00  -0.50  0.00  0.50  1.00  1.50  -1.50  -1.00  -0.50  0.00  0.50  1.00  1.50 

Li/ Uref  U/ Uref 

Figure  2.  Longitudinal  velocity  profiles  at  several  time  instants,  in  the  developed  re¬ 
gion  of  the  two-dimensional  channel. 


^®-'n  /n 


II 


ooooo  Anolytic  Solution,  y/o=1 
□  □□□□Analytic  Solution.  y/Q=0.025 

-  Current  method 

cjt=0 


0.00  0.20  0.40  0.60  0.80  1.00  1.20 

x/a 

OOOOO  Analytic  Solution.  y/o=1 

□  □□□□  Analytic  Solution.  y/a=0.025 

-  Current  method 


ooooo  Analytic  Solution,  y/a=1 
□  □□GO  Analytic  Solution.  y/o=0.025 
-  Current  method 

(Ut  =  71 


ooooo  Analytic  Solution,  y/a=1 
□  □□GO  Analytic  Solution.  y/a  =  0.025 

-  Current  method 

Ot  =  37T/2 


0.10  -1 


ure  4 ■  Longitudinal  velocity  component  along  the  circular  tube  for  one  cycle  of  the 


F igure  5 .  The  200x110  grid  for  the  flow  around  a  square  cylinder. 
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Figure  6.  Strouhal  number  as  function  of  the  Reynolds  number.  Comparison  with  exper 
mental  data  and  numerical  results. 
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Abstract 

A  solution-adaptive  structured  grid  technique  is  de¬ 
scribed  for  the  computation  of  steady  and  unsteady 
Euler  flows  past  aerofoils.  Transfinite  interpolation 
is  used  to  generate  the  grids  as  this  is  well-suited 
to  unsteady  flows,  since  grid  speeds  required  in  the 
flux  terms  are  available  directly  from  the  algebraic 
mapping.  A  novel  approach  to  grid  adaption  is  de¬ 
scribed.  Adaption  is  performed  by  adapting  the  in¬ 
terpolation  parameters,  instead  of  the  physical  grid 
positions,  so  the  adapted  grid  positions  are  available 
algebraically.  Hence,  the  grid  speeds  required  for  un¬ 
steady  computations  are  also  available  algebraically. 
For  unsteady  flows  grid  adaption  is  performed  by  im¬ 
posing  an  ‘adaption  velocity’  on  grid  points,  thereby 
applying  the  adaption  gradually  over  several  time 
steps  and  avoiding  the  interpolation  of  the  solution 
from  one  grid  to  another,  associated  with  instanta¬ 
neous  adaption.  Steady  and  unsteady  aerofoil  flows 
are  considered.  In  both  cases  the  adaptive  grid  tech¬ 
nique  is  shown  to  produce  sharper  shock  resolution 
for  a  very  small  increase  in  CPU  requirements. 

1  INTRODUCTION 

Increases  in  computer  power  have  meant  that  com¬ 
putational  methods  for  unsteady  flows  have  become 
commonplace.  However,  the  CPU  requirements  of 
these  methods  can  still  be  large.  Moving  grids  are  of¬ 
ten  used,  and  so  repeated  grid  generation  is  required, 
and  a  large  numerical  integration  time  may  be  neces¬ 
sary  to  reach  a  periodic  solution.  Grid  adaptivity  is 
therefore  desirable  to  improve  solution  resolution,  in 
regions  of  high  flow  gradients,  without  significantly 
increasing  the  CPU  requirements.  There  has  been 
much  recent  discussion  about  whether  structured  or 
unstructured  grids  are  best.  Unstructured  grids  ap¬ 
pear  to  have  the  advantage  of  lending  themselves 
more  naturally  to  grid  adaption  or  enrichment,  but 
the  computational  cost  can  be  large,  due  to  the  grid 
connectivity  data  required.  It  has  been  shown  [1] 
that  for  steady  computations  a  solution  computed 


using  an  unstructured  grid  requires  2  to  5  times  the 
CPU  time  of  that  on  a  structured  grid  with  the  same 
number  of  nodes.  The  situation  is  likely  to  be  worse 
for  unsteady  computations,  where  the  grid  must  be 
recomputed  at  least  once  per  time  step. 

This  paper  describes  a  solution-adaptive  grid  tech¬ 
nique  for  steady  and  unsteady  Euler  flows  using 
structured  grids  computed  by  the  transfinite  inter¬ 
polation  technique.  Transfinite  interpolation  is  well- 
suited  to  unsteady  computations  [2]  since  the  grid 
speeds  are  available  directly  from  the  interpolation 
equation.  The  grid  generation  is  remarkably  sim¬ 
ple,  grid  positions  are  obtained  by  interpolation  of 
boundary  positions  and  grid  speeds  by  interpolation 
of  boundary  speeds,  the  interpolation  being  the  same 
in  each  case. 

Structured  grid  adaption  is  often  achieved  by  solving 
a  set  of  partial  differential  equations  for  the  complete 
domain,  for  example  Catherall  [3]  solves  a  combina¬ 
tion  of  Laplace,  Poisson,  and  equidistribution  equa¬ 
tions  with  source  terms  added  to  control  grid  stretch¬ 
ing,  spacing,  and  orthogonality.  Pericleous  ei  al  [4] 
solve  an  equidistribution  equation,  based  on  solution 
gradients,  along  each  grid  line  in  each  coordinate  di¬ 
rection,  then  solve  a  Laplace  equation  for  the  result¬ 
ing  grid  positions  to  ensure  orthogonality.  Although 
these  approaches  are  suitable  for  steady  flows,  they 
are  less  suitable  for  unsteady  flows.  When  consider¬ 
ing  moving  grids  the  grid  speeds  are  required  in  the 
flux  evaluation,  and  neither  approach  leads  to  an  ob¬ 
vious  method  of  evaluating  these  speeds. 

A  different  approach  is  presented  here,  wherein  a 
new  interpolation  technique  is  developed  and  grid 
adaption  is  performed  by  adapting  the  interpolation 
parameters  instead  of  the  physical  grid  positions.  In 
this  way  it  is  possible  to  determine  the  adapted  grid 
positions  algebraically.  This  represents  a  significant 
advantage  when  considering  unsteady  computations 
on  moving  grids.  The  resulting  grid  position  equa¬ 
tion  can  simply  be  differentiated  with  respect  to  time 
to  yield  the  grid  speeds  algebraically. 
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However,  there  is  a  further  problem  encountered 
when  adapting  the  grid  during  an  unsteady  com¬ 
putation.  The  conventional  steady  technique  is  to 
adapt  the  grid  instantaneously  and  interpolate  the 
solution  to  the  new  grid.  This  is  less  suitable  for  un¬ 
steady  flows  since  many  adaptions  are  required  over 
several  periods  of  motion,  and  repeated  interpola¬ 
tion  may  result  in  a  gradual  loss  of  accuracy.  Un¬ 
structured  adaptive  grids  have  been  developed  for 
unsteady  flows,  see  for  example  [5,  6],  and  regions  of 
high  gradients  are  simply  enriched  with  extra  points. 
However,  an  interpolation  step  is  still  required,  and 
this  has  been  shown  to  lead  to  a  conservation  loss, 
even  for  unstructured  grids,  [5]. 

The  adaption  for  unsteady  flows  is  carried  out  here 
by  imposing  an  ‘adaption  velocity’  onto  each  grid 
point,  thereby  moving  the  grid  points  from  one 
adapted  grid  position  to  the  next  over  several  time 
steps.  This  avoids  the  instantaneous  adaption  ap¬ 
proach  and  so  interpolation  is  not  required.  It  also 
requires  no  extra  grid  generation. 


and  the  flux  across  the  face  simply  FAs.  This  gen¬ 
eral  flux  vector  is  split  into  a  forward  part  F  as¬ 
sociated  with  positive  moving  waves  only,  i.e.  all 

eigenvalues  of  ^j-  >  0,  and  a  backward  part  F  as¬ 
sociated  with  negative  moving  waves  only,  all  eigen¬ 
values  of  ^j-  <  0.  At  each  cell  face  a  pair  of  states 
are  thus  de^ed  and  a  single  numerical  flux  derived 
from  this  pair.  The  split  flux  components  are,  see 
Van- Leer  [7]  and  Parpia  [8], 
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2  UPWIND  DIFFERENCE  SCHEME 


A  finite-volume  upwind  scheme  is  used  to  solve  the 
two-dimensional  unsteady  Euler  equations  in  inte¬ 
gral  form,  for  the  domain  17  with  boundary  dO, 


dt 


f  f  Udxdy+  I  (Fdy  —  Gdx) 

J  Jci  Jdn 


(1) 


f  Ay  Ax\(-U±2a) 

and_M  the  Mach  number  normal  to  the  cell  face 
=  and  a  is  the  local  acoustic  speed.  The  above 
splitting  is  only  valid  for  |M|  <  1.  Else 


F^  =  F,  F  =  0,  ?:/  M  >  1,  (12) 


The  vector  of  conserved  variables  U  and  convective 
fluxes  F  and  G,  for  moving  grids,  are; 

U  =  [p,pu,pv,E]'^ ,  (2) 

F  =  [pU,puU  +  P,pvU,(E+ P)U +  x,Pfp) 

G  =  [pV,puV,pvV  +  P,{E  +  P)V  +  ytP]'^,{4) 

and 

U  =  u  —  Xi,  V  —  v-yt  (5) 

where  Xt  and  yt  are  the  inertial  grid  speeds  in  the  x 
and  y  directions  respectively. 


The  cartesian  velocity  components  normal  and  tan¬ 
gential  to  each  computational  cell  face,  and  the  con- 
travariant  velocity  normal  to  the  cell  face,  are  then 
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Here  Ax  and  Ay  are  the  cell  face  components  and 
As  is  the  face  length.  The  general  flux  function  in 
the  direction  normal  to  the  cell  face  is  then 


F  =  [pU,pUuP  P,pUv,EU  +  PTif ,  (8) 


=  0,  F  =  F,  if  M  <  -1.  (13) 

The  general  flux  vector  is  split  by 

F  =  F^(U+)-f  F~(U-).  (14) 

A  third-order  spatial  interpolation  is  used  to  eval¬ 
uate  and  U“  at  each  cell  face,  along  with  the 
continuously  differentiable  flux  limiter  due  to  An¬ 
derson  ti  al  [9]. 

Once  F  has  been  split  into  its  components  the  re¬ 
sulting  flux  must  be  rotated  back  to  our  original  co¬ 
ordinate  system.  This  is  achieved  by 

FA2/-GAa;  =  i2-^[F^(U+)  +  F"(U-)]As  (15) 

where  R  is  the  rotation  matrix. 

An  explicit  three-stage  Runge-Kutta  scheme  is  used 
to  integrate  the  equations  forward  in  time.  Local 
time-stepping  is  used  for  steady  flows. 


3  GRID  GENERATION 

Unsteady  flows  using  structured  moving  grids  will  be 
considered.  As  the  grid  positions  and  speeds  must  be 
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repeatedly  calculated  during  an  unsteady  computa¬ 
tion,  we  require  a  method  of  grid  generation  which 
is  simple,  and  which  gives  the  speeds  algebraically 
rather  than  having  to  evaluate  numerical  differences 
between  grid  positions  on  successive  time  levels.  It 
Wcis  thus  decided  to  use  the  transfinite  interpolation 
method  originally  described  by  Gordon  and  Hall  [10]. 
For  the  vector  function 


^iv,0  =  [A^>0>yiv,0]  (16) 


which  is  known  only  on  certain  lines  of  the  region 


m  <  V  <  V2  \ 

<  -e  <  6  J 


(17) 


Figure  1(a)  shows  the  grid  near  a  NACA0012  aero¬ 
foil,  resulting  from  the  above  interpolation,  using 
imax  =  129,  (99  points  on  the  aerofoil  surface,  15 
in  the  wake  either  side),  jmax  =  30,  st  —  1.2, 
and  the  outer  boundary  is  20  chords  away.  Figure 
1(b)  shows  the  corresponding  variation  of  77, •  and  ipj. 
(Grid  points  i  =  13  —  117,  j  =1  —  13  are  shown). 

By  differentiating  (25)  with  respect  to  time  the  grid 
speeds  can  be  obtained  analytically,  (blending  func¬ 
tions  assumed  constant,  and  outer  boundary  fixed) 

=  +  (26) 


transfinite  interpolation  gives  the  interpolated  func¬ 
tion  f(77,  ^)  throughout  the  region  by  a  direct  al¬ 
gebraic  mapping.  The  general  transfinite  interpo¬ 
lation  method  results  in  a  recursive  algorithm,  see 
Eriksson  [11].  However,  for  a  C-grid  the  inner  and 
outer  boundaries  are  lines  of  constant  ^  where  77  is 
known.  Defining  one  normal  derivative  only  at  the 
inner  boundary,  the  algorithm  reduces  to  ^  direction 
interpolation  only, 

(18) 

Here  blending  functions  in  the  ^  di¬ 

rection.  The  function  f  actually  represents  a  trans¬ 
formation  from  (77,  ^)  space  to  (x,  y)  space.  The  grid 
points  are  indexed  by  i  and  j  in  the  77  and  ^  directions 
respectively,  and  then  each  i  and  j  line  are  defined 
as  constant  77  and  ^  lines  respectively.  The  variables 
are  normalised  such  that 

0<77,e,t^°-'’'<  1.  (19) 

The  boundaries  £(77,  0)  and  £(77, 1)  are  known  at  imax 
discrete  points,  i.e.  £i(0)  and  £i(l).  The  value  of  ^ 
at  each  constant  ^  line  is  then  defined  as  jfjmax. 
The  blending  functions  7/)°  and  7/)^  control  the  spac¬ 
ing  in  the  ^  direction,  and  7/d  controls  how  far  the 
normal  direction  affects  the  line  direction.  The  most 
effective  blending  functions  have  been  found  to  be 


=  e,  (24) 


where  st  is  a  stretching  exponent.  The  imax  x  jmax 
grid  positions  then  come  from 

fij  =  7/7°£i(0)  -h  rP]  |:£i(0)  -f  V>|£i(  1).  (25) 


Hence,  grid  positions  are  calculated  by  interpolation 
of  the  boundary  positions,  and  grid  speeds  by  inter¬ 
polation  of  boundary  speeds,  the  interpolation  being 
the  same  in  each  ca.se. 

4  GRID  ADAPTION 

The  grid  is  to  be  adapted,  accordiirg  to  the  solution, 
so  that  grid  points  are  clustered  in  regions  of  high 
gradients.  Adaption  is  normally  performed  in  {x,y) 
space.  However,  while  this  gives  suitable  grids  for 
steady  computations,  the  grid  positions,  and  hence 
more  importantly  grid  speeds,  would  not  be  available 
algebraically  for  unsteady  computations.  Only  nu¬ 
merical  values  of  dx/dt  and  dy/dt  could  be  evaluated 
between  different  adapted  grids  during  an  unsteady 
computation,  and  these  could  cause  problems  of  grid 
distortion  and  crossover  when  grid  points  move  along 
highly  curved  lines. 

Adaption  is  achieved  here  by  writing  the  interpo¬ 
lation  function  in  a  more  general  form  and  adapting 
the  interpolation  parameters  instead  of  the  physical 
coordinates,  such  that  grid  positions  are  available  al¬ 
gebraically. 

Since  each  i  line  is  a  constant  77  line,  we  can  move 
points  along  an  i  line  by  simply  varying  ^  (or  ^p) 
along  that  line.  The  line  remains  unchanged,  only 
the  distribution  of  points  along  it  is  altered.  Adap¬ 
tion  in  the  ^  direction  is  thus  achieved  by  letting  the 
blending  functions  be  variant  in  77  as  well  as  and 
so  we  have, 

£,-,,■  =  7/,o  .£,:(0)-f  7/-I,.  Af.(o)  +  ^2.f.(l).  (27) 

Figure  2(a)  shows  the  near  aerofoil  grid  resulting 
from  varying  st  from  1.1  to  1.3  depending  on  the 
77  spacing,  i.e.  clustering  points  near  the  leading 
and  trailing  edges,  and  figure  2(b)  the  corresponding 
T]i,  7/>?y  variation.  To  adapt  in  the  other  direction,  we 
must  now  change  the  interpolation  so  that  each  i  line 
is  no  longer  constrained  to  be  a  line  of  constant  77. 
Along  each  j  line  77  is  now  varied  to  give  the  required 
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distribution  in  this  direction  (previously  rji  was  the 
same  on  every  j  line).  The  inner  and  outer  bound¬ 
aries,  f(??,0)  and  f(7^,  1)  are  determined  in  terms  of 
T],  so  that  they  are  known  at  any  point,  not  just  the 
specified  points  ft(0).  The  interpolation  is  then 


By  adapting  rj  and  t/>'  instead  of  x  and  y  the  grid 
positions  are  still  available  algebraically.  This  means 
that  the  grid  speeds  are  also  available  algebraically, 
which  is  essential  for  efficient  unsteady  adaption. 


4.1  Adaption  in  Each  Direction 

Instead  of  computing  a  completely  new  grid  due  to 
adaption,  it  is  desirable  to  simply  change  only  a  small 
region  of  the  grid  where  adaption  is  required. 

Adaption  in  the  j  direction  is  achieved  by  varying 
along  each  i  line.  For  adaption  in  the  i  direction 
Tj  is  changed  along  each  j  line  to  give  the  required 
distribution. 


where  2.0  <  fri,fr2  <  5.0,  and  Arjo,  A^o,  As,,„, 
and  are  the  initial  spacings. 

Consider,  for  example,  the  variation  of  rj  along  the 
aerofoil  surface,  =  0.  An  intermediate  variable,  C, 
is  defined  so  that  t]  =  rj{Q  where  clearly  0  <  C  <  1- 
A  uniform  distribution  of  C  is  used,  and  then  t]{Q 
is  defined  to  give  the  required  distribution  of  points. 
Figure  3(a),  shows  the  initial  distribution  of  ry  along 
the  aerofoil  for  99  points  on  the  aerofoil.  This  is  the 
unadapted  distribution  of  points  on  the  aerofoil. 

For  a  solution  where  adaption  is  required  in  the  rj 
direction,  if  for  example  a  normal  shock  is  present, 
At]  is  defined  at  that  point  using  equation  (34)  and 
then  use  a  cosine  variation  in  t)  to  get  back  to  the 
unadapted  distribution  of  ry  in  as  few  points  as  pos¬ 
sible.  Figure  3(b)  shows  the  variation  of  rj  along  the 
aerofoil  surface  for  the  flow  considered  in  the  next 
section,  when  normal  shocks  are  present  at  approx¬ 
imately  0.64  chord  on  the  upper  surface  and  0.32 
chord  on  the  lower.  This  simple  sampling  and  adap¬ 
tion  procedure  is  performed  for  each  line  in  each  di¬ 
rection. 


Adaption  is  required  in  regions  where  flow  quantity 
gradients  are  high,  and  the  local  Mach  number  gra¬ 
dient  is  used  as  a  sensor.  At  each  point  the  Mach 
number  gradient  in  each  direction  is  evaluated, 


dM 

1 

i 

ds^ 

As, 

dM 

1 

1 

ds^ 

As^ 

where 


(29) 

(30) 


-b  (ijij  -  Vi-ijY  (31) 
Asf  =  y/{xi,j  -  +  ivij  -  Vij-iY-  (32) 


The  gradients  at  each  point  are  normalised  by  the 
largest  value  over  the  domain.  If  this  gradient 
is  greater  than  a  threshold  value  then  adaption  is 
deemed  to  be  required  at  that  point.  There  will  usu¬ 
ally  be  regions  of  points  where  adaption  is  required, 
i.e.  2/3  points  around  a  shock  and  5  to  10  points 
around  a  stagnation  point,  and  so  in  each  region  the 
point  with  the  largest  Mach  number  gradient  is  iden¬ 
tified.  At  each  adaption  point  the  spacing  of  grid 
points  is  controlled  by  defining  two  spacing  factors, 
fri  for  stagnation  points,  and  /rj  for  other  adaption 
points.  Since  parallel  lines  in  (ry,V’“)  space  may  not 
be  parallel  in  {x,y)  space,  the  spacing  at  adaption 
points  must  be  scaled  thus. 


=  =  (33) 

fri  Asf  /r2  As 


At]  = 


Aryo  Asr,^ 
fri  Asr, 


or 


Arj  = 


A?yo  Asr,^ 
frn  As,,  ’ 


(34) 


5  STEADY  FLOW  RESULTS 

The  steady  flow  over  a  NACA0012  aerofoil  at  1.25^ 
incidence,  in  a  flow  of  freestream  Mach  number  0.8  is 
considered.  The  initial  grid  is  similar  to  that  shown 
in  Figure  2,  i.e.  129  x  30  C-grid,  with  st  varying 
between  1.1  and  1.3  (clustering  near  the  leading  and 
trailing  edges). 

Figure  4(a)  shows  the  pressure  coefficient  over  the 
aerofoil  computed  on  the  non-adaptive  grid,  the 
dashed  line  is  the  reference  AGARD  solution  [12]. 

The  grid  was  then  adapted  by  applying  the  Mach 
number  gradient  check  along  each  line  (in  x,  y  space) 
and  simply  clustering  points  (in  ry,  V*”  space)  where 
this  is  greater  than  the  threshold  level.  Figure  5 
shows  the  resulting  near-aerofoil  variation  of  rj  and 
and  the  corresponding  grid.  The  variation  in 
the  direction  is  unchanged,  since  where  the  Mach 
number  gradient  is  above  the  threshold  level  the  grid 
spacing  is  at  the  minimum  value  already.  Clearly 
the  grid  needs  to  be  smoothed.  This  is  often  done 
by  solving  a  Laplace  equation  for  the  grid  point  cor 
ordinates  (see  for  example  [4]).  However,  that  is  not 
required  here.  As  the  grid  was  initially  smooth  a 
smoothing  can  be  applied  to  the  whole  grid  in  each 
direction  and  the  unadapted  regions  of  the  grid  will 
be  unaffected.  A  simple  three-point  smoothing  is 
applied 


Vi,j  —  f  + 

^Ij  =  ■ 


(35) 

(36) 
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Figure  6  shows  the  smoothed  variation  of  t]  and  ip-, 
and  the  corresponding  grid.  Figure  4(b)  shows  the 
surface  pressure  coefficient  computed  on  the  adapted 
grid.  The  improved  shock  capturing  is  clear. 

In  many  steady  flow  adaption  procedures  the  grid  is 
allowed  to  adapt  gradually  by  effectively  progressing 
with  the  solution,  until  a  time-asymptotic  grid  and 
solution  are  reached.  The  grid  adaption  here  is  only 
applied  once,  as  this  will  be  the  case  when  an  un¬ 
steady  solution  is  periodically  sampled  as  in  section 
7.  The  adaption  only  has  one  ‘chance’  to  compute  a 
suitable  grid  at  each  adaption  point. 


6  UNSTEADY  EULER  METHOD 


The  explicit  time-stepping  scheme  used  for  steady 
flows  can  be  made  time-accurate  by  using  a  global 
time-step,  and  applied  to  unsteady  motion  on  a  mov¬ 
ing  mesh  by  incorporating  the  cell  area  changes  at 
each  stage  in  the  time-stepping  scheme  [13].  How¬ 
ever  for  a  typical  unsteady  computation,  with  the 
grid  size  above,  as  many  as  15000  time-steps,  and 
two  CPU  hours,  per  period  may  be  required.  It  is 
more  efficient  to  solve  the  unsteady  problem  as  a  se¬ 
ries  of  pseudo-steady  problems.  The  implicit  form  of 
the  differential  equation  for  each  computational  cell 
is 


g(^n  +  lun  +  l) 

dt 


-f  R(U’"+^) 


=  0 


(37) 


where  A  is  the  cell  area  and  R  is  the  upwinded  flux 
integral.  The  implicit  temporal  derivative  is  then  ap¬ 
proximated  by  a  second-order  backward  difference, 
following  Jameson  [14],  giving 


^[yl"+iU"+i]- A[yi-U"]  + 

^  +  R(U"  +  1)  =  0. 

A  new  residual  R*(U)  is  defined  as 


(38) 


-bR(U)  (39) 

and  then  a  new  differential  equation  can  be  written 
in  terms  of  a  fictitious  time  r. 


with  local  time-stepping  that  is  used  for  steady  com¬ 
putations.  This  approach  also  means  that  the  grid 
generation  routine  only  needs  to  be  called  once  ev¬ 
ery  real  time-step,  to  calculate  the  grid  positions  and 
speeds  at  the  next  time  level. 


6.1  Consideration  of  Cell  Area  Changes 


If  the  cell  areas  at  each  time  level  or  stage  are  simply 
calculated  using  the  instantaneous  physical  coordi¬ 
nates  of  the  cell  faces  a  numerical  error  is  introduced 
which  will  increase  with  time.  The  cell  areas  must 
therefore  satisfy  a  geometric  conservation  law  of  the 
same  integral  form  as  the  mass  conservation  law  [15], 


^  f  f  dxdy=  [  (xtdy-ytdx) 

dt  J  Jfi  jQa. 


(41) 


and  this  must  be  solved  using  the  same  numerical 
scheme  as  for  the  flow  quantities.  The  cell  areas  at 
the  next  real  time  level  are  thus  calculated  by 


A  4"  4"“1  9A/ 

- - h—  ^  {xt^A>.yk  -  Vt^AxkT 


-t-i 


3 


k-l 


where  k  —  1,2, 3, 4  represents  the  four  cell  faces. 


(42) 


7  UNSTEADY  GRID  ADAPTION 

The  normal  steady  flow  adaption  procedure  is  to 
compute  the  solution,  sample  it,  and  change  the  grid 
instantaneously.  However,  whether  using  structured 
grids,  where  a  fixed  number  of  grid  points  are  redis¬ 
tributed  to  be  clustered  in  regions  of  high  gradient, 
or  unstructured  grids,  where  extra  points  are  simply 
added  in  regions  of  high  gradient,  adaption  results 
in  grid  points  where  the  .solution  is  not  known.  This 
then  requires  the  interpolation  of  the  solution  from 
the  old  grid  to  the  new.  The  repeated  adaption  and 
interpolation  required  over  several  periods  in  an  un¬ 
steady  computation  can  result  in  a  gradual  degener¬ 
ation  of  the  solution  [5]. 

Also,  the  implicit  scheme  implemented  here  uses  val¬ 
ues  of  conserved  variables  and  cell  areas  from  previ¬ 
ous  time  levels,  which  do  not  exist  once  the  grid  has 
been  adapted,  so  instantaneous  adaption  cannot  be 
applied  to  the  unsteady  solver  used  here. 


A"+^— -bR.*(U)  =  0.  (40) 

dr 

This  is  simply  time-marched  to  convergence  in  the 
fictitious  time  r,  for  each  real  time-step.  There  is 
now  no  limit  to  the  size  of  the  real  time  step.  At, 
that  can  be  taken  and  this  leads  to  a  large  reduction 
in  CPU  times.  The  time  step  is  now  limited  by  ac¬ 
curacy  rather  than  stability.  For  each  real  time  step 
equations  (40)  are  solved  to  convergence  using  an  im¬ 
plicit  form  of  the  three  stage  time-stepping  scheme 


To  avoid  this  the  grid  adaption  is  spread  over  sev¬ 
eral  (real)  time  steps  and  the  motion  of  each  point 
described  in  terms  of  an  ‘adaption  velocity’.  The  pe¬ 
riodic  nature  of  the  unsteady  solution  is  exploited  by 
sampling  the  solution  over  one  period  and  adapting 
the  grid  accordingly  over  the  next  period.  Therefore 
over  one  unsteady  period  the  solution  is  sampled  ev¬ 
ery  nsamp  real  time  steps,  and  the  resulting  adapted 
(t),  V'")  distribution  stored.  When  calculating  the  so¬ 
lution  on  the  next  period,  over  each  set  of  nsamp 
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time  steps  the  velocity  of  each  point  required  for  that 
point  to  reach  its  position  at  the  next  adapted  grid 
is  imposed  on  each  point,  and  the  grid  moves  grad¬ 
ually  between  each  adapted  state. 


If  k  is  the  adaption  index  {k  =  0,  ..,nadapt,  where 
nadapt  =  ntfnsamp  and  nt  is  the  number  of  real 
time  steps  per  period),  then  yi’ilP  is  the  grid 
point  distribution  at  adaption  k.  To  move  the  grid 
points  from  one  distribution  to  the  next  over  nsamp 
time  steps  we  calculate  the  speed  of  each  point,  in 
{p,  iP)  space 


dm,  3 

Ak  +  l)  (fc) 

'll, 3  - 

(43) 

dt 

nsampAt  ’ 

dipi 

,2(i-f-l)  ,2(fc) 

-  Ki 

(44) 

dt 

nsampAt 

The  grid  speeds  are  obtained  by  differentiating  equa¬ 
tion  28  with  respect  to  time. 


d 


di’h. 


dip}  ,•  d  dijif  ,■ 


dt 


dt 


(45) 

Then  superimposing  the  adaption  speeds  onto  the 
unsteady  motion  speeds,  and  replacing  ^  by 
where  required,  we  obtain  (the  outer  boundary  is 
fixed  in  time  so  1)  =  0  due  to  motion) 


dt  dt  dt  ’ 


+ip 


dlij  d  ,  dpij  d  M 


dt  dr] 
diiij  d  (  d 


dt  di] 


dt  dr] 
dt  ( 


m 


M 


(46) 


where  the  superscript  M  represents  speeds  due  to 
the  aerofoil  motion.  The  implicit  code  is  run  with 
the  unadapted  grid  for  two  periods,  the  adaptive  grid 
data  being  stored  during  the  nadapt  samples  of  the 
second  period,  and  then  two  periods  of  adaptive  grid 
computations  are  performed. 


8  UNSTEADY  RESULTS 

The  scheme  was  applied  to  the  Mach  0.755  flow 
about  a  NACA0012  aerofoil  pitching  about  quarter 
chord.  The  aerofoil  motion  is  defined  by 

a  =  0.016‘’-f2.5rsm(a;f)  (47) 

The  reduced  frequency  parameter,  k  =  was 

0.0814  where  c  is  the  aerofoil  chord,  and  Uoo  is  the 


undisturbed  flow  speed.  The  scheme  was  run  at  a 
CFL  number,  based  on  r  of  1.4,  and  local  time  step¬ 
ping  was  used  to  accelerate  convergence  within  each 
real  time  step.  There  were  180  real  time  steps  per  pe¬ 
riod  and  the  same  grid  data  was  used  as  previously, 
129  X  30  points,  with  99  points  on  the  aerofoil.  In  the 
adaptive  computation  nsamp  was  10  and  so  nadapt 
was  18. 

Figure  7  shows  normal  force  and  moment  (about 
b  chord)  coefficient  loops  obtained  by  the  implicit 
method,  adaptive  and  non-adaptive,  and  from  exper¬ 
iment  [16].  The  coefficient  loops  are  quite  similar, 
but  the  adaptive  Cn  loop  is  slightly  narrower,  and 
the  Cm  loop  has  larger  ‘steps’,  than  the  standard  so¬ 
lution.  The  instantaneous  pressure  distributions  are 
shown  in  figure  8.  The  improved  shock  capturing 
with  the  adaptive  grid  is  clear.  Figure  9  shows  the 
near  aerofoil  adaptive  grid  at  each  of  the  incidences 
considered  in  figure  8. 

The  non-adaptive  scheme  required  19  CPU  minutes 
per  period  on  a  Stardent  3000  machine,  and  the 
adaptive  grid  solution  approximately  22  CPU  min¬ 
utes.  An  explicit  time-stepping  scheme  required  ap¬ 
proximately  15000  time-steps  and  two  CPU  hours 
per  period  [13].  Thus  the  implicit  method  requires 
only  one-fifth  of  the  CPU  time  of  an  explicit  scheme, 
even  with  an  adaptive  grid. 


9  CONCLUSIONS 

Steady  and  unsteady  solutions  have  been  computed 
using  non-adaptive  and  adaptive  grids  generated 
by  a  new  transfinite  interpolation  technique.  Grid 
adaption  is  performed  by  adapting  the  interpolation 
parameters,  instead  of  the  physical  grid  positions,  so 
that  the  adapted  grid  positions  are  still  available  al¬ 
gebraically.  This  interpolation  has  been  shown  to  be 
ideal  for  generating  structured  moving  grids,  since 
it  is  very  simple,  thus  requires  little  CPU  time,  and 
since  the  grid  speeds,  even  for  adapted  grids,  are 
available  directly  from  the  interpolation  equations. 
The  simplicity  of  the  interpolation  results  in  great 
flexibility,  and  we  can  adapt  the  grid  during  an  un¬ 
steady  computation  by  imposing  an  ‘adaption  veloc¬ 
ity’  onto  each  grid  point,  thus  performing  adaption 
gradually.  This  avoids  the  interpolation  of  the  so¬ 
lution  from  the  old  grid  to  the  new  associated  with 
instantaneous  adaption. 

An  upwind  Euler  scheme  is  used  to  compute  the  solu¬ 
tions.  This  is  implemented  using  a  dual-time  implicit 
method  for  unsteady  flows  which  is  very  efficient,  re¬ 
quiring  only  I  the  CPU  time  of  the  explicit  scheme. 

For  steady  and  unsteady  aerofoil  computations,  the 
adaptive  grid  method  produces  sharper  solutions  for 
very  little  increase  in  CPU  requirements. 
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Currently,  only  a  fairly  crude  grid  redistribution 
technique  is  employed.  Future  work  will  include  de¬ 
veloping  a  more  sophisticated  method,  along  with 
extending  the  adaptive  technique  into  three  dimen¬ 
sions.  The  method  should  be  equally  simple,  the 
only  difficulty  arising  from  the  third  dimension  being 
that  the  boundary  definition  will  involve  determin¬ 
ing  spline  equations  for  surfaces  rather  than  lines. 
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Fig.l.  Near  Aerofoil  Grid  (a)  {x,y)  and  (b) 


Fig. 5.  Near  Aerofoil  Adapted  Grid  (a)  and  (b)  {x,y) 


Fig.6.  Near  Aerofoil  Smoothed  Adapted  Grid  (a)  (^) 
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Fig. 7.  Unsteady  Normal  Force  and  Moment  Coefficient. 
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1  Abstract 

The  features  and  abilities  of  the  DLR-r-code,  a  finite 
volume  approximation  of  box  type  for  the  Navier- 
Stokes  equations  governing  viscous,  compressible  flu¬ 
id  flow,  are  described  in  detail.  The  code  is  able  to 
compute  flow  in  moving  reference  frames  and  is  build 
upon  dynamically  adaptive  concepts  to  allow  for  grid 
refinement  in  the  framework  of  non-stationary  aero¬ 
dynamics.  Implicit  as  well  as  explicit  time-stepping 
schemes  can  be  used  depending  on  the  kind  of  ap¬ 
plication. 

2  Introduction 

The  DLR-r-Code  is  a  finite  volume  approximation  of 
the  Navier-Stokes  equations  governing  compressible, 
viscous  flow.  The  method  uses  a  box-type  discretisa¬ 
tion  and  works  on  general  conforming  triangulations. 
The  discretisation  of  the  convective  fluxes  is  accom¬ 
plished  by  means  of  an  approximate  Riemann  solver 
while  the  diffusive  fluxes  are  discretised  in  a  central 
manner. 

To  achieve  high  resolution  recovery  techniques  of 
ENO-type  are  applied.  New  recovery  techniques  are 
presented  which  are  based  on  radial  basis  functions. 
Although  their  use  is  restricted  by  now  to  small 
problems,  it  can  be  shown  that  they  obey  a  certain 
optimality  condition. 

Explicit  time  stepping  through  TVD-Runge-Kutta 
methods  is  used  in  a  parallelized  version  of  the  code. 
This  parallelized  version  includes  an  intelligent  load 
balancer  for  performance-controlled  domain  decom¬ 
position  and  can  handle  arbitrary  message  passing 
libraries  like  PVM  or  P4. 

To  effectively  deal  with  unsteady  flow  problems  such 
as  pitching  airfoils  and  moving  bodies  in  general,  the 
implementation  of  implicit  time  stepping  schemes 
is  also. considered.  The  development  of  an  implicit 
method  on  unstructured  grids  leads  to  an  linear  sys¬ 
tem  of  equations  with  a  large  sparse  and  badly  condi¬ 
tioned  matrix.  In  this  case,  the  fundamental  mathe¬ 
matical  assignment  is  the  discreption  of  a  fast  solver 
for  such  linear  systems  of  equations.  Extensive  in¬ 
vestigations  with  several  possible  algorithms  indi¬ 
cated  the  superiority  of  a  pre-conditioned  GMRES 
algorithm.  The  preconditioner  is  a  simple  incom¬ 
plete  LU-factorization  which  dramatically  improves 
the  convergence  properties  of  GMRES  in  the  case  of 


the  Euler  equations.  Experience  gained  by  numeri¬ 
cal  investigations  has  shown  that  even  fast  unsteady 
flow  phenomena  like  moving  shocks  in  channels  can 
be  effectively  treated  by  this  combination  of  algo¬ 
rithms. 

The  r-Code  employes  dynamically  adaptive  strate¬ 
gies  based  on  insertion  and  removing  of  grid  points. 
Conservative  interpolation  avoids  mass  errors  during 
the  process  of  adaptation.  One  of  the  main  design 
goals  was  the  use  of  reliable  error  indicators  instead 
of  refinement  indicators  based  on  gradients  of  flow 
variables.  The  indicators  we  consider  are  based  on 
the  finite  element  residual  of  the  Euler  equations. 


3  Governing  equations 


We  consider  the  Navier-Stokes  equations  in  a  mov¬ 
ing  reference  frame.  In  this  context  the  governing 
equations  are  given  in  the  form 


Integration  is  performed  on  time-dependent  control 

volumes  cr{t)  C  with  outer  unit  normal  vector  n. 

Here,  u  =  {p,  pvi,  pv2,  pE)'^  denotes  the  vector 

of  conserved  variables,  /*:  and  f".  are  the  convective 
’  A?  -3 

and  viscous  fluxes,  respectively,  given  by 


L]{u)  := 


/  pVj  \ 

pviVj  +  d{p 

pvoVj  +  sip 
pvzVj  -b  Si^p 
\  pHVj  +Ft^grid,;  / 


L-iE) 


I  °  \ 

T2j 

^3j 

\  Ut  =  l  / 


The  quantity  e  denotes  internal  energy  which  is  giv¬ 
en  by  e  =  E  —  -f  -f  Ug)  and  the  enthalpy  H 
is  defined  as  H  E  +  p/ p.  Pressure  is  given  by  the 
equation  of  state  p  —  {y—l)p  [E  -  fl-  Uj  -b  Cg)) , 
7  being  the  ratio  of  specific  heats.  The  temperature 
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is  given  by  T  —  7(7  —  .l)Ma.^e  and  the  elements  of 
the  sliear  stress  tensor  are  Tjj  =  fj{dx^Vi  -f  dj;,Vj)  -f 
SlX{dx^V\  +  file  viscosity  assumed  to 

follow  the  Sutherland  law  fi  =  +  S)/{T  -f  S), 

where  S  =  110°K/Tco-  Moreover,  the  connection 
between  the  termal  conductivity  and  the  viscosity  is 
defined  by  Stokes’  hypothesis  to  be  A  = 

The  velocity  is  the  velocity  of  the  moving  ref¬ 

erence  frame  and  Vj  :=  Vj  —  j  denotes  the  con- 
travariant  velocity. 


4  The  DLR-T-code 

In  order  to  simplify  notation  we  describe  the  details 
of  our  numerical  method  in  two  space  dimensions. 
The  extension  to  three  space  dimensions  follows  by 
straightforward  considerations  based  on  the  2-d  case. 


4.1  Finite  volume  approximation 

We  consider  conforming  triangulations  Th  consisting 
of  tetrahedra  (triangles  in  two-d)  in  the  sense  of  Cia- 
rlet  [5]  and  define  a  discrete  control  volume  a'i{t)  as 
the  volume  of  the  barycentric  subdivision  of  T/,  en¬ 
closing  the  node  —  {xi^\,XinY'  ^i^d  bounded  by 
the  straight  line  segments  =  1,2,  connecting 

the  midpoint  of  the  edge  with  the  point  x,.  The 
geometry  of  the  control  volumes  is  shown  in  figure 
1.  Figure  2  shows  the  boundary  of  a  control  volume 


Figure  1:  Control  volumes  in  2-d 
and  serves  to  define  our  notations.  The  point  x^  is 


defined  by 

with 

^ 

m^m 


in  order  to  account  for  highly  stretched  meshes  in 
boundary  layer  regions. 

Utilizing  our  notion  of  control  volumes  and  de¬ 
noting  the  cell  average  on  Gi{t)  by  Ui{t)  := 


Figure  2:  Boundary  of  a  control  volume  in  2-d 


the  Navier-Stokes  equations  (1) 
can  be  re-written  in  the  form 


d  ,  , 


1 


1=1 


E  E/.E(  sirSfe) 


wdrere  N(i)  :=  {j  i  dcXj  Pi  dcr{  ^  0}  is  the  set  of 
indexes  of  nodes  neighbouring  node  Since  the 
line  integrals  are  not  defined  if  u  is  discontinuous 
two  numerical  flux  functions  are  introduced,  name¬ 
ly  G  :  x  x  R-  ^  R^,  approximating  the 

convective  and  viscous  fluxes,  respectively,  and  sat¬ 
isfying  the  fundamental  consistency  conditions 

2  2 

!:=i 

In  our  implementation  the  combined  Riemann  solver 
AUSMDV  following  Liou  and  Wada  [33]  is  used  for 
the  numerical  flux  H_,  which  includes  Hanel’s  scheme 
[10]  and  was  extended  in  [18]  for  the  use  in  an  implic¬ 
it  formulation  considering  moving  grids.  Several  oth¬ 
er  choices,  like  Roe’s  or  Osher’s  Riemann  solver,  are 
easily  implemented  in  the  current  fra.mework.  The 
viscous  fluxes  are  discretised  by  the  central  differ¬ 
ence 


v-^  /  Mr  +  \ 
t=l  ^  ' 

Applying  the  midpoint  rule  to  the  integral  along  ifj 
results  in 


+0{h^)  +  0ih.^), 
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where  the  first  error  term  is  due  to  the  quadrature 
rule  while  the  second  error  term  depends  on  the  func¬ 
tions  u.  Using  cell  average  values,  i.e.  Uj  =  u^,  results 
in  a  first  order  approximation,  i.e.  q  —  1,  due  to  the 
weak  approximation  property  of  the  cell  average  op¬ 
erator,  see  [22],  [23].  To  increase  the  approximation 
order  a  recovery  function  Uj  is  sought  on  cr,  which 
approximates  u  at  least  with  order  O(h^).  It  is  eas¬ 
ily  seen  that  linear  polynomials  recover  u  up  to  this 
order. 

4.2  Recovery  algorithms 

If  denotes  the  barycentre  of  the  control  volume 
(7,  then  a  linear  polynomial 

Uiix,t)=  (2) 

l£l<i 

has  to  be  recovered  on  cr,:(t)  such  that  it  satisfies  the 
recovery  condition 


Recovery  in  box-type  methods  is  best  described  in 
terms  of  a  meta-triangulation  T/^  which  is  defined 
to  be  the  triangulation  of  the  barycentres  of  control 
volume  cTi  and  the  surrounding  boxes  if  x],  is  con¬ 
nected  with  each  of  the  surrounding  Xj'' ,  see  figure 
3.  If  E{i)  :=  {f  G  Ti^}  denotes  the  set  of  the  meta- 


Figure  3:  Meta-triangulation  of  the  barycentres 

triangles  surrounding  Xj  then  a  linear  polynomial  Xj, 
can  be  computed  on  each  of  the  T. 

In  a  TVD-like  approach,  compare  [24],  the  gradient 
(a^o,  of  the  recovery  polynomial  (2)  can  be  ob¬ 

tained  from  the  linear  interpolants  in  a  completely 
isotropic  manner,  namely 

Fi|  ^  4<r,nT 


Then,  defining  ago  :=  u,-,  the  polynomial  (2)  will 
certainly  satisfy  the  recovery  condition.  However, 
the  isotropic  recovery  of  the  gradients  does  not  take 
care  of  shocks  in  the  solution  and  will  thus  lead  to 
instabilities.  According  to  the  TVD  methodology  a 
slope  limiter  $,■  hast  to  be  introduced  such  that  the 
recovery  polynomial  is  written  in  the  form 

Wife!)  =  + 

We  have  good  experience  in  using  the  limiter  de¬ 
scribed  by  Barth  and  Jesperson  in  [4],  but  conver¬ 
gence  to  steady  state  is  enhanced  if  one  adds  a  mod¬ 
ification  as  suggested  by  Venkatakrishnan  in  [31]. 

A  simple  ENO-type  recovery  can  also  be  described 
in  terms  of  the  linear  interpolants  ][_f .  The  linear  re¬ 
covery  polynomial  u,-  on  the  box  is  then  chosen  to 
be  the  one  linear  polynomial  on  the  surrounding 
meta-triangles  for  which  the  modulus  of  the  gradient 
is  minimal,  i.e.  for  which 

J  =  mm  VTTy  . 

I  'I  T  I 

is  valid  where  ^  denotes  the  i-th  component. 

Experience  with  this  type  of  recovery  is  reported  in 
[22]  and  [25]. 

In  order  to  further  increase  the  spatial  accuracy  of 
the  DLR-r-code  we  are  currently  working  on  the  ex¬ 
tension  towards  a  third  order  scheme  by  recovering 
quadratic  polynomials  close  to  the  ideas  of  Abgrall, 
see  [1],  [2].  In  [3]  an  algorithm  based  on  Miihlbach 
expansions  was  developed  which  allows  the  efficient 
and  stable  computation  of  quadratic  recovery  poly¬ 
nomials  in  a  step-by-step  manner.  Preliminary  nu¬ 
merical  results  concerning  a  third-order  r-code  are 
given  in  [15]. 

4.3  Optimal  recovery 

Although  polynomial  recovery  functions  seem  to  be 
attractive  at  first  glance  for  their  simplicity,  their 
main  drawback  lies  in  the  enormous  widening  of  the 
stencil  if  higher  order  recoveries  are  sought.  On  the 
other  hand,  even  locally  defined  polynomials  of  high 
degree  exhibit  weak  properties  concerning  their  os¬ 
cillatory  behaviour.  Additionally,  as  can  be  seen 
from  application  of  the  theory  of  Optimal  Recovery 
as  reviewed  in  [22],  polynomials  do  not  exhibit  any 
optimality  condition  with  respect  to  their  recovery 
properties.  We  do  want  to  recover  a  function  w,-  on 
(Ji  for  which  the  difference 

\ui{y,t)-u{y,t)\ 

between  the  recovery  function  and  the  true  solu¬ 
tion  at  the  Gauss  points  y  is  smallest.  Functions 
minimizing  semi-norms  in  their  associated  function 
spaces  (i.e.  Splines)  are  exactly  those  functions  for 
which  the  above  quantity  is  minimal.  In  multiple 
space  dimensions  splines  are  found  in  the  class  of 
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radial  basis  functions,  for  example  the  well-known 
thin-plate  spline.  First  experiments  with  this  kind 
of  recovery  functions  in  [22],  [23]  showed  impressive 
increase  in  accuracy.  Although  recovery  of  radial 
basis  functions  is  much  too  expensive  as  compared 
to  polynomial  recovery  the  techniques  developed  in 
[3]  could  very  well  provide  a  framework  in  which 
these  more  complicated  functions  could  be  competi¬ 
tive  with  polynomial  algorithms. 

In  recovery  with  radial  basis  functions  a  recovery 
function  of  the  form 

Ami  M 

j=0  k=l 

is  sought  for  the  l-ih  component,  where  A(<t,-)^/  ~ 
/(i/)dy  denotes  the  cell  average  operator. 
The  radial  function  $  is  assumed  to  satisfy  the  fun¬ 
damental  condition  of  being  conditionally  positive 
definite  and  TTkjk  —  1, ...  ,N ,  denote  a  basis  of  the 
space  of  polynomials  of  a  certain  degree,  which  de¬ 
pends  on  the  radial  function  $  chosen.  The  number 
of  nodes  N  in  the  recovery  stencil  is  another  quan¬ 
tity  which  has  to  be  choosen.  Numerical  experience 
gained  so  far  has  indicated  that  polynomial-based 
ENO  stencil  selection  criteria  work  well  also  in  the 
case  of  radial  basis  functions. 

Using  the  well  known  thin  plate  spline 

N-l 

^x,t)  -  AjAlt(aj)  (|a; -y|Mog(|£-y|) 

;=o 

d"  ^  ^  ) 

a  <2 

which,  by  construction,  is  able  to  reproduce  linear 
polynomials,  amounts  to  use  at  least  four  control 
volumes  in  the  stencil.  In  an  ENO-like  manner  one 
can  think  of  the  stencil  selection  according  to  fig¬ 
ure  4,  where  the  control  volumes  were  chosen  to  be 
triangles. 


Figure  4:  The  construction  of  four  node  sets  out  of 
a  certain  neighbourhood 

If  on  each  of  the  four  stencil  sets  a  radial  basis  recov¬ 
ery  function  is  computed  the  one  with  the  smallest 
total  variation  norm  is  selected  and  assigned  to  the 


control  volume.  First  results  of  this  jjrocedure  are 
reported  in  section  6. 

Meanwhile,  radial  basis  functions  with  compact  sup¬ 
port  are  being  constructed.  We  mention  the  class  of 
Wu  functions  as  designed  in  [36]  and  the  very  re¬ 
cent  developments  of  Wendland  [34].  These  func¬ 
tions  are  unconditionally  positive  definite  and  thus 
do  not  need  the  polynomial  augmentation  as  the  thin 
plate  spline.  Furthermore,  their  compact  support 
makes  them  very  attractive  for  practical  purposes. 
Whether  the.se  functions  can  be  competitive  in  run¬ 
time  to  polynomial-based  recovery  algorithms  is  the 
contents  of  future  research  on  ENO  approximations. 

4.4  Time  stepping  schemes  and  par¬ 
allelism 

The  DLR-r-code  was  originally  supplied  with  an 
explicit  Runge-Kutta  time  stepping  algorithm  de¬ 
signed  by  Shu  and  Osher  in  [21]  which  respects  the 
TVD-properties  of  the  spatial  discretisation,  see  [24]. 
However,  these  schemes  are  limited  in  CFL  num¬ 
ber  by  1  which  is  a  dramatic  upper  bound  for  ap¬ 
plication  in  an  adaptive  framework  where  grid  cells 
can  be  very  small.  In  the  meantime  other  Runge- 
Kutta  schemes  with  up  to  five  stages  are  in  use  and 
show  satisfying  behaviour  especially  when  used  in 
a  multigrid  environment  for  steady  problems.  For 
the  computation  of  unsteady  flows,  as  pitching  air¬ 
foils,  the  restrictions  due  to  the  CFL  condition  are 
still  too  strong.  One  way  to  overcome  the  limita¬ 
tions  of  explicit  time  stepping  schemes  is  the  u.se  of 
parallel  computers  which  is  easy  in  the  case  of  finite 
volume  approximations  because  domain  decomposi¬ 
tion  is  natural.  A  grid  partitioner  was  developed  in 
connection  with  an  intelligent  load  balancing  algo¬ 
rithm  to  re-divide  and  re-distribute  grid  patches  de¬ 
pending  on  the  load  of  the  processors  used.  In  that 
framework  a  parallel  computation  is  easily  done  in 
an  environment  consisting  of  a  cluster  of  worksta¬ 
tions  running  PVM  or  P4  while  the  machines  are 
still  occupied  by  other  users. 

In  figure  5  the  grid  of  a  channel  with  forward  facing 
step  is  shown.  The  grid  partitioner  has  divided  the 
grid  into  59  patches  which  contain  nearly  the  same 
number  of  nodes.  The  possible  speedup  is  document- 


Figure  5:  Grid  partitioning 
ed  in  the  diagram  of  figure  6  where  speedup  vs.  num- 
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ber  of  processors  is  shown  for  an  Intel  PARAGON. 
The  flow  is  the  supersonic  test  case  by  Woodward 
and  Colella  [35]  as  discussed  also  in  section  6.1. 
As  can  be  seen  from  figure  6  the  present  approach 


Figure  6:  Speedup  on  an  Intel  PARAGON 


towards  parallelism  through  domain  decompostion 
leads  to  a  very  efficient  method. 

For  use  on  conventional  machines  an  implicit  time 
stepping  scheme  according  to 


(r+i)  -u(r) 

tn  +  l  _in 


-g2iv,<p)- 


u(r-i) 


+0  . 


where  rj  = 


and 


g2{v><l>) 


=  /  iS  if 

\  1  if  ^  =  0 


if  (f)  =  I 
if  <p  =  o 


was  designed. 

The  numerical  flux  functions  are  evaluated  at  the 
time  ’n  -f  1’  whereby  a  linearisation  is  necessary 
which  leads  to  a  linear  system  of  equations  in  the 
form 


A..Au"+  y  =  i  =  , 

=11  — '  Z—/  =ij  -1-11 

ieAflO 

where  Auf  =  -  uf  and  A..,  B..  E  Con- 

—1  —I  — *  =it’=ij  ^ 

sequently,  for  each  time  step  a  linear  system 


4y  =  6  (3) 

has  to  be  solved,  where  A  is  a  large  sparse  non- 
symmetric  matrix.  For  the  solution  of  the  system 
(3)  the  GMRES  algorithm  developed  by  Saad  and 
Schulz  [19],  [20]  is  used.  Therefore,  the  system  is 
transformed  into  an  equivalent  minimisation  prob¬ 
lem.  First,  we  define  the  function  /  ;  R"  — >  Rj 


'T  fiy)  =  11^  “  Avlli  ‘■’'■'‘■'I  choose  an  arbitrary  ini¬ 
tial  vector  yo-  Starting  with  m  =  0  tlie  resid¬ 
ual  T'm  —  miny=y^^-r£  /(y)  is  computed,  where 

•NT.(4>Io)  :=  «po?i  {rq,4l0'4"!l>  ■  •  •.4”’~^l£} 

notes  the  ?n-th  Krylov  subspace  and  7^  =  h  —  Ayo- 
Now  we  increase  in  until  is  below  a  given  tol¬ 
erance.  Then  we  compute  the  optimal  approximate 
solution  y  =  arg  min  /(y).  Considering  the  fact 

—7n  y=vo-hi  ~ 

i.eA'm 

that  the  expense  to  calculate  the  residual  increases 
with  the  Krylov  subspace  dimension  it  is  efficient  to 
limit  this  dimension.  If  this  limit  is  reached  before 
that  of  the  tolerance  the  approximation  y^^^  has  to 
be  calculated  and  used  as  the  initial  value  during  a 
repetition.  This  technique  is  called  ’’GMRES  with 
restart” . 

Since  the  convergence  rate  of  an  iterative  method 
depends  on  the  condition  number  of  the  matrix  A, 
an  incomplete  LU-factorisation  is  used  as  a  precon¬ 
ditioner  in  order  to  decrease  the  condition  number. 
Hereby  the  incomplete  LU-factorisation  is  a  pair  of 
a  lower  left  and  a  upper  right  (U)  matrix  satis¬ 
fying  the  following  three  conditions: 

1.  U_..  presents  the  unit  matrix  for  all  i, 

2.  L..  =  17..  =  A..,  if  A.,  is  a  null  matrix, 

=ij  =ij'  =,.j 

3.  (LU) . .  =  A. . ,  if  A. .  is  not  a  null  matrix 

ij  =ij  =t.j 

and  the  linear  system  is  transformed  into 
Ag~%~'y^k  ,  y  =  g-%-^y. 

A  detailed  description  of  these  preconditioned  GM¬ 
RES  algorithm  in  comparison  with  other  implicit 
and  explicit  finite  volume  schemes  is  presented  in 
[17]. 

5  Adaptive  concepts 

5.1  Refinement  algorithms 

Over  the  years  e.xperience  was  gained  with  several 
different  refinement/coarsening  strategies  for  trian¬ 
gulations.  This  work  is  documented  in  [26],  [27],  [11] 
and  [13].  Numerical  experience  indicated  that,  at 
least  for  reliable  Euler  grids,  a  version  of  the  isotrop¬ 
ic  red-green  refinement  as  described  in  [14]  gives  su¬ 
perior  grids.  In  this  refinement  strategy  triangles 
which  have  to  be  refined  are  red-refined  according 
to  figure  7.  Remaining  triangles  with  two  hanging 
nodes  are  also  red-refined  before  green  refinement 
turns  the  triangulation  again  into  a  conforming  one. 
Note  that  at  the  beginning  of  each  refinement  cycle 
the  previous  green  refinements  are  removed  in  or¬ 
der  to  keep  the  triangulations  stable,  i.e.  in  order  to 
avoid  too  small  angles  occuring  after  several  adap¬ 
tation  cycles. 

In  a  corresponding  re-coarsening  strategy  several 
topological  configurations  can  be  identified  in  which 
points  can  be  removed  from  the  grid.  It  can  be 
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shown,  see  [14],  tliat  a  refined  mesh  can  always  be 
completely  coarsened  up  to  its  initial  state. 

Note  that  in  order  to  keep  the  process  of  coarsening 
conservative  it  is  necessary  to  use  interpolation  pro¬ 
cedures  respecting  conservation.  Examples  of  such 
procedures  are  given  in  [14]. 


Figure  7;  Red  (left)  and  green  refinement  of  a  trian¬ 
gle 


5.2  Error  indicators 

In  contrast  to  classical  approaches  in  CFD  the 
DLR-r-Code  relies  on  residual-based  error  indica¬ 
tors  which  were  developed  in  subsequent  papers  [26], 
[27],  [11],  [28].  This  type  of  indicators  was  devel¬ 
oped  for  use  in  codes  for  the  Euler  equations  but 
we  are  currently  working  on  extensions  towards  the 
Navier-Stokes  equations.  If  Cu  —  0  denotes  the  ab¬ 
stract  form  of  the  Euler  equations  in  which  £  is 
the  corresponding  differential  operator  of  first  order 
and  TTy  denote  the  linear  interpolants  of  the  flow 
variables  on  triangle  T,  then  the  local  error  of  the 
numerical  method  under  consideration  is  defined  by 
§.T  ■=  Kt  ~iL-  tire  numerical  approximation  TTy  is 
inserted  into  the  differential  equation  the  deviation 
from  the  zero  vector  is  a  measure  of  closeness  to  the 
exact  solution.  The  quantity 

Ily  •“  £'^y 

is  therefore  called  the  residual.  It  w'as  shown  in  [28] 
that  a  two-sided  error  bound  of  the  form 

Ci\\]:t\\d-(T)  <  IliTlk^Cr)  <  C'2|lr^|b*(T) 

can  be  proved  where  ||  •  ||£)*(T)  denotes  the  dual 
graph-norm.  First  numerical  results  were  present¬ 
ed  in  [28]  which  indicated  that  the  use  of  the  dual 
graph-norm  leads  to  similar  results  as  the  use  of  the 
w'eighted  L^-norm 

h\\rl^\\mT), 

h  denoting  the  length  of  the  longest  edge  of  T, 
which  was  used  for  heuristic  reasons  before,  see  [11], 
[26],  [27].  In  [30]  Siili  was  able  to  prove  that  the 
dual  graph-norm  is  indeed  essentially  equivalent  to 
h\\r^\\mT)  and  since  this  locally  weighted  L^-norm 


is  much  easier  to  implement  this  is  the  error  indica¬ 
tor  of  our  choice. 

The  additional  problem  occuring  with  the  Navier- 
Stokes  equations  lies  in  the  .second  derivatives  inher¬ 
ent  in  the  diffusive  fluxes.  Although  w-e  are  currently 
not  able  to  prove  error  bounds  it  seems  possible  to 
approximate  llie  second  derivatives  in  the  compu¬ 
tation  of  the  residual  in  a  measure-theoretic  w'ay  by 
sampling  the  jumps  of  the  first  derivatives  across  the 
edges  in  normal  direction.  This  type  of  error  indi¬ 
cators  was  inspired  by  the  work  of  C.  Johnson  et  al. 
on  the  adaptive  streamline  diffusion  finite  element 
method,  see  [7],  [12],  and  developed  by  Gdhner  and 
Warnecke  [9].  We  are  currently  investigating  this 
type  of  indicators  for  compressible  flow  [29]. 

6  Numerical  results 

6.1  Unsteady  flow  in  a  channel 

To  show  the  ability  of  the  code  to  adaptively  re¬ 
solve  the  flow  features  we  consider  the  test  case  of 
Maoc  =  3  flow  through  a  channel  with  forward  facing 
step.  This  case  was  used  by  Woodward  and  Colel- 
la  [35]  for  extensive  comparison  of  finite  difference 
schemes.  In  figure  8  four  grids  at  consecutive  times 
are  shown  together  with  the  corresponding  density 
distributions.  As  can  be  seen  the  adaptive  algorithm 
has  not  only  resolved  all  of  the  flow  features  but  al¬ 
so  succeeded  in  coarsening  those  part  of  the  meshes 
wdiich  were  previously  refined.  The  computation  was 
done  with  the  parallel  r-Code  on  a  cluster  of  work¬ 
stations  using  PVM. 

6.2  Pitching  airfoil 

Figures  9  and  10  show  the  results  of  a  calculation 
of  an  unsteady  inviscid  flow  about  the  NACA0012 
airfoil  in  comparison  vvith  experimental  data.  In 
this  case  the  airfoil  is  pitching  harmonically  about 
the  quarter  chord  point  with  a  reduced  frequency 
of  k  =  0.1628  and  an  amplitude  of  a  =  2.51°. 
The  freestream  Mach  number  is  Maoo  =  0.755  and 
the  angle  of  attack  initially  is  0.016°.  Consequent¬ 
ly,  the  time-dependent  angle  of  attack  is  a{t)  = 
0.016° -b  2.51°  sin(0. 1628 -t). 

Figure  9  shows  the  obtained  instantaneous  pressure 
distribution  in  comparison  with  the  experimental  da¬ 
ta  for  several  times  during  the  third  cycle  of  motion. 
Figure  10  shows  the  comparison  of  the  lift  coefficient 
and  the  moment  coefficent  vs.  the  time-dependent 
angle  of  attack.  The  computational  data  are  very 
close  to  the  experimental  ones.  Note,  that  no  diffu¬ 
sive  effects  were  included  in  this  calculation. 


Figure  8:  Evolution  of  the  adaptive  grid  and  corresponding  solutions 


Figure  9:  Comparison  of  the  instantaneous  pressure 
distribution  between  the  numerical  computation  (in- 
viscid)  and  experimental  data 

6.3  Viscous  flow  about  a  NACA0012 
airfoil 

The  next  case  was  chosen  to  test  the  method  and 
the  adaptation  algorithm  for  viscous  flow  computa- 


Figure  10:  Comparison  of  lift  and  momentum  coeffi¬ 
cient  vs.  time-dependent  angle  of  attack  between  the 
numerical  computation  (inviscid)  and  experimental 
data 

tions.  We  consider  the  steady  laminar  flow  about  a 
NACA0012  airfoil  with  a  Reynolds  number  of  500, 
a  Prandtl  number  of  0.72,  a  reference  temperature 
of  273  degree  Kelvin,  a  freestream  Mach  number  of 
0.85  and  an  angle  of  attack  of  0°.  The  obtained 
Mach  number  distribution  is  shown  in  figure  11  and 
figure  12  presents  (he  adapted  grid.  The  adaptation 
indicator  used  is  based  on  the  finite  element  residual 
of  the  Navier-Stokes  equations. 

6.4  3-D  transonic  wing 

As  a  three  dimensional  testcase  the  inviscid  flow 
about  an  Onera  M6  wing  with  IMa-oo  =  0.84  and 


Figure  12;  NACA0012  airfoil  -  Adapted  grid. 


an  angle  of  attack  of  3.06®  is  considered.  Figure  13 
shows  isolines  of  the  Mach  number  distribution  on  a 
coarse  hybrid  grid  with  less  than  40,000  gridpoints. 
Figure  14  shows  the  distribution  of  the  computed 
L2-norm  of  the  finite  element  residual  for  the  same 
solution.  It  can  be  seen  that  the  much  too  coarse 
resolved  leading  and  trailing  edge,  the  tip  region  as 
well  as  the  shock  were  picked  up.  Thus,  also  in  three 
space  dimensions  the  finite  element  residual  of  the 
Euler  equations  can  be  used  as  an  adaptation  indi¬ 
cator. 

To  accelerate  convergence  to  steady  state  agglom¬ 
eration  multigrid  as  described  by  Venkatakrishnan 
and  Mavriplis  [32]  is  used.  The  coarse  grid  discreti- 


Figure  14:  To-Norm  of  the  residual  displayed  on  the 
surface.  Darker  regions  indicate  heigher  Lo-norm. 
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Figure  15:  Li-Norm  of  the  density  residual  vs.  CPU¬ 
time. 


-  singlegrid 

M6  wing  - multigrid 


Figure  16:  Lift  coefficient  vs.  CPU-time. 


6.5  Radial  recovery  functions 

The  accuracy  of  recovery  algorithms  based  on  radial 
basis  functions  can  be  seen  in  an  application  to  a 
simple  model  problem.  Consider  the  linear  partial 
differential  equation 


dtU-\-  a  -  V^u 
u(x, 0) 


0 

uo(x),  X  G  R" 


and 

uo(s) 


r  -^R  +  l  ;  E<om 

0  ;  else. 


where  R  :=  (xi  -  |)“  +  (xo  -  j)^-  The  initial  func¬ 
tion  is  a  cone  of  unit  height  which  is  rotated  around 
the  origin  under  the  action  of  the  differential  equa¬ 
tion.  Measuring  the  remaining  cone  height  after 


180°  of  rotation  gives  a  relial)le  criterion  concern¬ 
ing  the  accuracy  of  the  recovery.  In  figure  17  the 


Figure  17:  Grid  and  solution  without  recovery 


Figure  18:  Solutions  with  linear  polynomial  (left) 
and  thin  plate  spline  recovery 

grid  used  in  shown  with  a  numerical  solution  of  a 
finite  volume  approximation  without  recovery.  The 
remaining  cone  height  is  a  disappointing  0.382  and 
the  shape  of  the  cone  is  dramatically  corrupted.  Us¬ 
ing  a  linear  polynomial  recovery  algorithm  following 
Durlofsky,  Engquist,  Osher  [6]  results  in  the  solution 
shown  in  the  left  part  of  figure  18.  The  cone  height 
now  is  0.635  but  the  shape  of  the  cone  is  still  lacking 
regularity.  Using  the  thin  plate  spline  recovery  as 
described  in  4.3  results  in  a  cone  with  proper  shape 
and  height  0.886.  This  solution  is  shown  in  the  right 
part  of  figure  18. 

Experiments  with  radial  basis  functions  with  com¬ 
pact  support  have  indicated  even  better  numeri¬ 
cal  results  than  those  obtained  with  the  thin  plate 
spline.  Additionally,  methods  for  the  fast  construc¬ 
tion  of  recovery  functions  are  currently  being  devel¬ 
oped  [3]  so  that  there  is  hope  that  these  functions 
can  be  implemented  in  the  DLR-r-Code  in  the  near 
future. 
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SUMMARY 

A  code  to  calculate  unsteady  aerodynamic 
loads  on  non-uniformly  moving  3-D 
isolated  wings  has  been  prepared.  The 
Euler  equations  are  solved  by  means  of 
a  time-accurate  Finite-Volume  method 
with  second  order  central  spatial 
discretization  and  Runge-Kutta  time 
integration.  The  code  has  been 
implemented  in  a  parallel  supercomputer. 
The  numerical  scheme  used  together  with 
some  representative  results  are 
presented. 

LIST  OF  SYMBOLS 

c  =  local  chord 

Co  =  reference  length 

Cl  =  section  lift  coefficient=  lift/q^cai 
Cp  =  pressure  coefficient  =  {p-p„) /q.o'i 
d  =  dissipative  flux 
D  =  dissipative  operator 
E  =  specific  total  energy 
F,G,H  =  components  of  Euler  flux  vector 
k  =  reduced  frequency  =  cjc/2V„ 
k'^’,k‘^’  =  artificial  viscosity  constants 
L  =  scaling  factor 

=  free-stream  Mach  number 
n  =  surface  normal  unit  vector 
p  =  static  pressure 
Q  =  convective  operator 
q„  =  free-stream  dynamic  pressure 
R  =  residual 
R*  =  averaged  residual 
S  =  cell  face  area 
t  =  time 

t*  =  dimensionless  time  =  tV„/co 

U  =  vector  of  conservative  variables 

u,v,w  =  components  of  flow  velocity 

V„  =  free  stream  velocity 

Vj;  =  mesh  velocity 

Uq  =  mean  angle  of  attack 

ofi  =  pitching  motion  amplitude 

(S  =  Runge-Kutta  coefficients 

E  =  cell  boundary 

e  =  residual  averaging  parameter 

artificial  viscosity  parameters 
\,\x,a  =  spectral  radius  of  flux  Jacobian 
matrices  in  ^ ,  77,  and  f  directions 
p  =  air  density 

^/77,r  =  curvilinear  coordinates 
V  =  shock  wave  sensor 
Q  =  cell  volume 
cj  =  frequency  of  oscillation 

Subscripts 
i  =  cell  column 


j  =  cell  row 
k  =  cell  plane 
n  =  cell  face 
q  =  Runge-Kutta  stage 

1 .  INTRODUCTION 

Aeroelastic  problems  appear  to  be  of 
increasing  importance  in  the  design  of 
aircraft.  The  size  of  the  structures  and 
its  elastic  behavior,  the  aerodynamic 
interference  of  different  components, 
transonic  effects,  structural  and 
control  nonlinearities,  etc,  are 
becoming  a  severe  limiting  factor.  There 
is  thus  a  strong  need  to  apply 
sophisticated  and  reliable  aeroelastic 
simulation  tools  already  in  the  early 
design  stage  of  a  new  development.  These 
tools  have  to  couple  highly  accurate, 
robust  and  user  friendly  CFD  codes  with 
Structural  Dynamics  software.  Whereas 
the  latter  is  already  well  established, 
the  former  still  need  development  before 
a  generally  recognized  standard  code  is 
available . 

To  clear  a  configuration  of  aeroelastic 
problems,  a  very  large  number  of  cases 
have  to  be  run.  Time  accurate  CFD  codes 
are  generally  considered  to  be 
computationally  too  expensive  for 
industrial  application.  Potential  theory 
is  mainly  used,  whereas  the  next  level 
of  approximation,  i.e.  Euler  Equations 
with  or  without  boundary  layer  coupling 
is  only  now  slowly  starting  to  find  its 
way  in  the  design  offices  despite  the 
better  approximation  they  provide .  The 
application  of  high  performance  parallel 
computers  to  this  kind  of  problems  is 
obviously  extremely  interesting,  not 
only  because  it  allows  to  tackle  larger 
problems  in  a  shorter  time  but  also 
because  it  opens  the  possibility  to 
perform  parametric  studies  in  a 
reasonable  time. 

A  time -accurate  Euler  code  has  been 
prepared  to  calculate  inviscid  transonic 
flow  around  oscillating  3-D  wings.  The 
code  has  been  implemented  in  the  NWT 
(Numerical  Wind  Tunnel)  parallel 
supercomputer  of  the  National  Aerospace 
Laboratory  in  Japan.  The  objective  of 
the  present  work  has  been  to  study  the 
influence  on  the  unsteady  results  of  the 
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corresponding  to  each  of  the  axis  in  the 
transformed  plane 


different  parameters  that  control  the 
calculation. 

The  following  presents  a  brief 
description  of  the  scheme  and  its 
parallel  implementation,  together  with 
some  results. 

2.  NUMERICAL  SCHEME 

Among  the  different  schemes  which  have 
been  developed  to  solve  the  unsteady  3-D 
Euler  equations  [1-10]  ,  the  very  popular 
one  of  Jameson  [10]  has  been  selected 
for  this  study.  In  the  following  a  brief 
description  of  the  implementation  made 
here  is  given.  More  details  can  be  found 
in  [11]  . 

2 . 1  Governing  Equations 

The  flow  is  assumed  to  be  governed  by 
the  three-dimensional  time -dependent 
Euler  equations,  which  for  a  moving 
domain  Q  with  boundary  E  may  be  written 
in  integral  form  as : 

IM 

a 

(1) 

where  U  is  the  vector  of  conservative 
flow  variables;  (F,  G,  H)  are  the  three 
components  of  the  Euler  flux  vector;  Vj. 
is  the  velocity  of  the  moving  boundary; 
and  n  is  the  unit  exterior  normal  vector 
to  the  domain. 
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Here  p,  p,  (u,  v,  w)  and  E  respectively 
denote  the  density,  pressure,  cartesian 
velocity  components  of  the  flow,  and 
specific  total  energy. 

In  order  to  close  the  system  of 
equations  (1)  a  sixth  equation  is  needed 
which  is  obtained  from  the  thermodynamic 
relationships  for  a  perfect  gas 

p  =  (y-l)  p[E  -  {u^  +  v^  +  w^)  ]  (3) 


2.2  Spatial  Discretization 

The  domain  around  the  wing  is  divided 
into  an  0-H  mesh  of  hexahedral  cells, 
for  which  the  body- fitted  curvilinear 
coordinates  respectively  wrap 

around  the  wing  profile  (clockwise) , 
normal  and  away  from  it,  and  along  the 
span.  Figure  1  shows  an  example. 
Individual  cells  are  denoted  by  the 
subscripts  i,j,k  respectively 


The  integral  equation  (1)  is  applied 
separately  to  each  cell.  Assuming  that: 
the  independent  variables  are  known  at 
the  center  of  each  cell;  calculating  the 
flux  vector  as  the  average  of  the  values 
in  the  cells  on  either  side  of  the  face; 
and  taking  the  mesh  velocities  as  the 
average  of  the  velocities  of  the  four 
nodes  defining  the  corresponding  face, 
the  following  system  of  ordinary 
differential  equations  (one  per  cell) 
results : 


j,k' 


(4) 


where  the  convective  operator  Qi,j,k 


Oi.j.k  =  E  -Sn 


(5) 


is  a  function  of 


-'i,j+i,k(  j  Schemes 
constructed  in  this  manner  reduce  to 
central  difference  schemes  on  cartesian 
meshes,  and  are  second  order  accurate  if 
the  mesh  is  sufficiently  smooth. 


This  formulation  is  inherently  non- 
dissipative  (ignoring  the  effect  of 
numerical  boundary  conditions) ,  so  that 
dissipative  fluxes  D^  ^  have  been  added 
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The  well  known  model  of  Jameson  [12]  is 
used.  The  idea  of  this  adaptive  scheme 
is  to  add  4'^*'  order  viscous  terms 
throughout  the  domain  to  provide  a  base 
level  of  dissipation  sufficient  to 
prevent  non-linear  instabilities,  but 
not  sufficient  to  prevent  oscillations 
in  the  neighborhood  of  shock  waves .  In 
order  to  capture  shock  waves  additional 
2"“  order  viscosity  terms  are  added 
locally  by  a  sensor  designed  to  detect 
discontinuities  in  pressure.  To  avoid 
overshoots  near  the  shock  waves  produced 
by  the  combined  presence  of  the  2"'^  and 
4'^*'  order  terms,  the  latter  are  cut  off 
in  that  area  by  an  appropriate  switch. 


For  the  dissipative  flux  across  the  face 
separating  cells  i,j,k  and  i+l,j,k  we 
have  (for  the  other  faces  similar 
expressions  apply) : 
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The  dissipation  coefficient  and 
are  calculated  as 
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where  j  is  a  scaling  factor  which 
depends  on  the  spectral  radius  of  the 
flux  Jacobian  matrix  in  ^  direction  j 
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(11) 


as  a  sensor  of  the  presence  of  a  shock 
wave . 


2.3  Time  Integration 

The  system  of  ODEs  in  (4)  is  solved  by 
means  of  an  explicit  5  stage  Runge-Kutta 
scheme  with  two  evaluations  of  the 
dissipation  terms . 
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which  is  second  order  accurate  in  time 
and  can  be  shown  [11]  to  have  good 
diffusion  and  dispersion  errors 
characteristics  and  less  computational 
cost  per  time  step  than  other  schemes 
with  a  lesser  number  of  stages. 


2 . 4  Residual  Averaging 

This  explicit  time- integration  scheme 
has  a  time  step  limit  that  is  controlled 
by  the  size  of  the  smallest  cell. 


At, 
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Qi,7,ic  (13) 


Even  though  the  CFL  number  of  the  5- 
stage  Runge-Kutta  scheme  is  of  the  order 
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of  4,  the  resulting  At's  are  usually  too 
small  for  practical  applications .  This 
restriction  can  be  relaxed  by  using  a 
technique  of  residual  averaging  [13] 
which  gives  an  implicit  character  to  the 
time -integration  scheme.  Before  each 
time-step  the  residuals  Ri, j,k=Qi, j,k”Di 
are  replaced  by  modified  residuals  R*i,j,k 
which  are  calculated  by  means  of  an  ADI 
method : 

(14) 

where  and  6/  are  the  second 

difference  operators  in  the  7j,  and  f 
directions  and  ei,j,k  is  the  smoothing 
parameter  [14] 

=  max{-^[(  )^-l]  ,  0  } 

(15) 

with  At  denoting  the  desired  time  step. 

Within  a  linear  analysis,  the  former 
technique  assures  unconditional 
stability  for  any  size  of  the  time  step. 
However,  as  the  resulting  effective 
Courant  number  becomes  large  the 
contribution  of  the  dissipation  terms  to 
the  Fourier  symbol  goes  to  zero,  and 
consequently,  the  high  frequencies 
introduced  by  the  non-linearities  are 
undamped  [15] .  Thus  the  practical  limit 
for  the  time  step  is  determined 
principally  by  the  high  frequency 
damping  characteristics  of  the 
integration  scheme  used.  As  the 
properties  of  the  5 -stage  Runge-Kutta 
time- integration  method  are  very  good 
from  this  point  of  view,  CFL  values  as 
high  as  240  have  been  successfully  used, 
which  significantly  decrease  the 
calculation  time  needed  for  a  typical 
case . 

2.5  Frees tream  Capturing 

For  the  scheme  to  satisfy  the  freestream 
capturing  condition  [16]  it  must  be 


which  is  the  discrete  form  (consistent 
with  the  numerical  scheme  here  employed) 
of  the  Geometric  Conservation  Law  as 
formulated  by  Thomas  and  Lombard  [17] . 
It  states  that  the  cell  volumes  must  be 
advanced  in  time  in  the  same  way  as  the 
fluid  variables  (even  if  they  could  be 
calculated  analytically  at  each  time 
step)  to  prevent  grid-motion- induced 
errors  in  the  numerical  solution. 


2 . 6  Boundary  Conditions 

The  following  Boundary  conditions  are 
imposed: 

a)  Kinematic  boundary  condition  on  the 
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wing  surface.  The  pressure  on  the 
surface  is  extrapolated  from  the 
internal  points . 

b)  Symmetry  condition  at  the  k=l  plane. 

c)  Far  field  boundary  condition  in  terms 
of  Rieman  Invariants  for  a  one 
dimensional  flow  normal  to  the  outer 
computational  boundary. 

3.  RESULTS 

In  the  following,  results  for  the  LANN 
wing  are  presented.  This  is  a  high 
aspect  ratio  (AR=7.92)  transport  type 
wing  with  a  25°  quarter-chord  sweep 
angle,  a  taper  ratio  of  0.4,  and  a 
variable  12%  supercritical  airfoil 
section  twisted  from  about  2.6°  at  the 
root  to  about  -2.0°  at  the  tip.^  The 
geometry  used  for  the  computational 
model  is  that  of  [18] .  The  results 
presented  here  correspond  to  the  design 
cruise  condition:  iy[„=0.82,  Q!o=0.6°.  The 
wing  performs  harmonical  pitching 
oscillations  about  an  axis  at  62%  root 
chord  with  an  amplitude  of  aj=0.25°  and 
a  reduced  frequency  k=0.104. 

The  calculation  proceeds  as  follows: 
first  an  initial  steady  solution  is 
obtained  and  quality  controlled;  then 
the  time-accurate  calculation  is  started 
and  is  time-marched  until  the  initial 
transitories  are  damped  out  and  an 
harmonic  solution  is  obtained  (typically 
three  cycles  of  oscillation  are  needed) ; 
finally  the  results  of  the  last  cycle 
are  Fourier  analyzed  to  extract  the  mean 
value  and  harmonics  of  the  different 
aerodynamic  coefficients. 

Because  of  the  large  memory  and  CPU  time 
requirements  of  this  type  of  methods, 
very  few  studies  are  available  in  the 
literature  that  assess  the  relative 
influence  on  the  unsteady  results  of  the 
different  parameters  that  control  the 
calculation.  To  take  advantage  of  the 
benefits  of  parallelization  to  perform 
this  task  was  one  of  the  main  objectives 
of  the  present  work . 

3.1  Artificial  Viscosity 

Calculations  have  been  done  for  a 
80x16x30  grid  with  different  amounts  of 
artificial  viscosity,  which  has  been 
varied  by  means  of  the  two  coefficients 
k‘^>  and  k'^>  in  (8)  and  (9)  .  Results  in 
terms  of  mean  part  and  real  and 
imaginary  parts  of  the  first  harmonic  of 
the  pressure  distributions  around  wing 
sections  at  17.5%  and  82.5%  semi-span 
are  presented  in  Figures  2  to  5. 
Logically  the  main  effect  is  on  the 
shock  resolution  which  in  turn 
influences  the  magnitude  and  positions 
of  the  corresponding  peaks  in  the  first 
harmonic  component . 


3.2  Grid  Density 

Two  different  grids,  namely  80x16x30  and 
160x32x30,  have  been  considered.  The 
spanwise  grid  distribution  and  outer 
boundary  location  was  kept  the  same  for 
both  cases,  with  20  grid  planes  on  the 
wing  and  10  grid  planes  between  the  wing 
tip  and  the  side  boundary  of  the 
computational  domain  which  is  located  at 
two  semi-spans  from  the  plane  of 
symmetry.  The  outer  botindary  around  the 
root  section  is  at  9  chords .  Results  are 
shown  in  Figures  6  and  7,  where  the 
first  harmonic  of  the  pressure 
distributions  around  wing  sections  at 
17.5%  and  82.5%  semi-span  is  presented. 
It  can  be  seen  that  the  influence  is 
large  as  a  consequence  of  the  better 
shock  resolution  of  the  finer  grid.  On 
the  other  hand,  as  was  to  be  expected, 
the  discrepancies  are  much  smaller  when 
integrated  along  the  chord  to  obtain 
sectional  forces  and  moments,  as  can  be 
seen  in  Figure  8  for  the  lift 
coefficient . 

3.3  Time  Step  Size 

Figures  9  and  10  respectively  show  the 
real  and  imaginary  parts  of  the  first 
harmonic  of  the  pressure  distribution 
around  the  wing  section  at  92.5%  semi¬ 
span  calculated  with  the  80x16x30  grid 
using  dimensionless  time- step  sizes  At* 
ranging  from  0.002  to  0.01  (which 
correspond  to  CFLs  from  30  to  150) .  This 
section  at  the  wing  tip  has  been 
selected  because  at  its  trailing  edge 
the  smallest  cells  are  to  be  found,  for 
which  the  stability  limit  should  first 
be  reached  in  accordance  with  (13) .  This 
is  indeed  the  case  as  can  be  clearly 
seen  in  the  zoomed  region.  Outside  of 
this  area  the  results  are  time-step 
independent.  Fortunately  this 
instability  is  very  well  behaved, 
growing  only  at  a  very  slow  rate  at  the 
same  time  that  it  spreads  inboard  and 
towards  the  trailing  edge,  so  that 
meaningful  engineering  calculations 
could  be  performed  at  larger  At*  without 
a  significant  loss  of  accuracy. 

3.4  Deforming  vs.  Rigid  Moving  Grids 

In  the  present  method  the  instantaneous 
grid  is  computed  by  deformation  of  an 
initial  steady  mesh  in  such  a  way  that 
the  grid  points  near  the  wing  surface 
are  forced  to  closely  follow  the  wing 
(which  motion  is  known  as  a  function  of 
time)  whereas  the  displacements  of  grid 
points  far  from  the  wing  surface 
gradually  decrease  and  vanish  at  the 
outer  boundary. 

Dynamic  grid  deformation  such  as  this  is 
computationally  expensive  as  it  involves 
re-calculation  of  grid  position, 
kinematics  and  metrics  at  each  time 
step.  For  those  cases  in  which  the  wing 
has  no  elastic  deformations  it  is  also 
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possible  to  perform  the  calculation  on 
a  grid  that  moves  with  the  wing  as  a 
rigid  body.  This  option,  although 
theoretically  less  accurate  than  the 
former,  is  obviously  computationally 
less  expensive.  To  evaluate  its 
influence  on  the  results,  calculations 
have  been  performed  both  with  the  usual 
deforming  grid  and  with  a  rigid  one.  It 
has  been  found  that  differences  are 
negligible.  No  figure  is  given  because 
the  differences  are  within  the 
resolution  of  the  graph. 

3.5  Freestream  Capturing 

Calculations  have  been  performed  both 
imposing  and  not  imposing  the  freestream 
capturing  condition  (16) .  In  the  latter 
case  the  cell  volumes  at  each  time  step 
have  been  calculated  analytically.  Again 
the  differences  in  the  results  are 
totally  negligible. 

4.  PARALLEL  IMPLEMENTATION  IN  NWT 

The  above  presented  scheme  was 
originally  developed  in  a  Cray-YMP 
computer  and  has  been  implemented  in  the 
NWT  (Numerical  Wind  Tunnel)  machine  of 
the  National  Aerospace  Laboratory  [19]  . 
This  is  a  distributed  memory  parallel 
machine  with  140  vector  processing 
elements  (PE)  and  two  Control  Processors 
connected  by  a  cross-bar  network. 

Each  PE  is  itself  a  vector  supercomputer 
similar  to  Fujitsu  VP400  with  a  peak 
performance  of  1 . 7  GFlops  and  includes : 
2  56  Mbytes  of  main  memory,  a  vector 
unit,  a  scalar  unit  and  a  data  mover 
which  communicates  with  other  PE's.  The 
resulting  total  performance  of  NWT  is 
236  GFlops  and  35  GBytes. 

The  code  has  been  parallelized  using 
Fujitsu  NWT  FORTRAN  which  is  a  FORTRAN 
77  extension  to  perform  efficiently  on 
distributed  memory  type  parallel 
computers.  The  extension  is  realized  by 
compiler  directives.  Basic  execution 
method  is  the  spread/barrier  method. 

The  present  scheme  has  always  two 
directions  in  which  the  computation  can 
be  performed  simultaneously.  Accordingly 
we  can  use  one  direction  for 
vectorization  and  the  other  for 
parallelization.  For  the  0-H  grid  used 
here  the  most  natural  way  of 
parallelizing,  i.e.  assigning  different 
vertical  grid  planes  to  different 
processing  elements  has  been  used.  We 
thus  divide  every  array  evenly  along  the 
k- index  and  assign  each  part  to 
different  PEs .  The  vectorization  is  made 
in  i-direction  which  usually  has  the 
largest  number  of  cells . 

With  this  partition,  i-derivatives  and 
j  -derivatives  can  be  computed  in  each  PE 
without  any  communication.  The 


computation  of  k-derivatives  in  PE^ 
requires  data  stored  in  PEj^^i  and  PE^_i 
which,  in  principle,  would  imply  the 
need  to  communicate  with  the  neighbor 
PEs,  thus  increasing  the  overhead.  This 
is  avoided  using  overlapped  partitioned 
arrays.  Array  partitions  are  defined  in 
such  a  way  that  adjacent  partitioned 
ranges  automatically  overlap  and  have 
some  common  indices  (with  a  depth 
depending  on  the  stencil)  so  that  copies 
of  selected  data  at  the  interfaces 
between  two  PEs  are  stored  at  both  local 
memories .  In  this  way  k-derivatives  can 
also  be  computed  in  each  PE  without  'any 
communication.  At  the  end  of  each 
calculation  cycle,  data  in  the  overlap 
range  of  the  partitioned  arrays  is 
harmonized  by  copying  its  value  from  the 
parent  PE . 

The  above  explained  procedure  can  be 
maintained  throughout  the  code  except  at 
the  residual  averaging  subroutine,  where 
the  alternating  directions  method  (ADI) 
employed  prevents  its  use  as  it  requires 
a  sequential  calculation.  The  inversions 
in  the  i-  and  j -directions  can  be  done 
in  each  PE  independently  so  that  the  k- 
parallelization  can  be  maintained,  with 
the  vectorization  in  j -direction  for  the 
i- inversion  and  in  i-direction  for  the 
j -inversion.  As  for  the  k-inversion,  the 
process  must  be  sequential  in  the  k- 
direction  so  that  we  transfer  the 
affected  data  from  a  k-partition  to  a  j - 
partition.  Then  we  can  compute  the  k- 
inversion  on  each  PE  with  vectorization 
in  i-direction.  At  the  end  of  the 
calculation  the  data  is  transferred  back 
to  a  k-partition.  Figure  11  depicts  the 
calculation  flow. 

In  Figure  12  the  speed-up  factor  (ratio 
of  CPU  time  in  1  PE  to  CPU  time  in  n 
PEs)  vs.  number  of  PEs  used  is  presented 
for  calculations  performed  for  the  LANN 
wing  with  the  160x32x30  grid.  The 
results  strongly  depend  on  whether  the 
residual  averaging  technique  is  used  or 
not,  because  of  the  need  to  transfer 
data  between  partitions.  Its  relative 
importance  in  relation  to  the  normal 
data  transfer  workload  decreases  as  the 
number  of  PEs  used  increases  and  both 
curves  tend  to  reach  a  common  limit.  It 
must  be  born  in  mind  that  the  160x32x30 
grid  only  fills  about  20%  of  the  main 
memory  of  a  single  PE  (less  than  1%  when 
3  2  are  used)  ,  so  that  the  granularity  of 
the  problem  is  extremely  low.  The 
parallel  efficiency  is  expected  to 
dramatically  increase  for  larger  grids, 
as  has  been  the  case  with  other  codes 
[20]  . 

An  indication  of  the  CPU  times  required 
to  march  in  time  the  solution  for  one 
period  of  oscillation  (using  a  At*  of 
0.01  for  the  coarse  grid  and  0.004  for 
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the  fine  one  which  respectively 
correspond  to  CFLs  of  150  and  24  0)  is 
given  in  Table  1 . 

5 .  CONCLUDING  REMARKS 

A  time-accurate  Euler  code  to  calculate 
unsteady  transonic  flow  about 
oscillating  wings  has  been  prepared  and 
implemented  in  the  NWT  parallel 
supercomputer.  The  achieved  performance 
has  shown  the  feasibility  of  using  this 
type  of  computationally  expensive 
methods  in  an  engineering  environment . 
The  influence  of  different  parameters  on 
unsteady  computations  has  been  studied. 
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Fig.  12:  Speed-up  Factor. 


Fig.  11:  Calculation  Flow. 
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Parallel  Implicit  Upwind  Methods  for  the 
Aerodynamics  of  Aerospace  Vehicles 

K.J.  Badcock^  and  B.E.  Richards  ^ 


Abstract.  Research  at  the  University  of  Glasgow,  based 
around  implicit  methods  for  solving  the  Euler  and  Reynolds’ 
Averaged  Navier-Stokes  equations  and  to  be  reported  in  this 
paper,  has  targeted  advanced  CFD  methods  for  tackling  the 
complex  flow  fields  of  interest  to  aerospace  vehicle  design¬ 
ers.  The  requirements  for  this  application  are  for  efficient,  high 
resolution  schemes  which  can  be  ported  to  various  MPP  sys¬ 
tems  and  implemented  with  robustness  to  give  fast  turn  round 
times  at  competitive  cost.  It  is  recognised  that  the  most  de¬ 
manding  topics  concern  unsteady  viscous  flows  and  thus  time 
accuracy  and  efficiency  is  pursued  as  a  high  priority.  This  pa¬ 
per  then  reviews  the  work,  ongoing  and  planned,  by  the  team 
at  Glasgow  in  code  developments  embracing  future  comput¬ 
ing  environments  and  including  some  results  not  previously 
published.  The  example  test  cases  used  in  the  performance 
and  sensitivity  studies  include  the  transonic  flow  results  on 
the  RAE  2822  aerofoil  and  ONERA  M6  wing  selected  by 
AGARD.  The  computing  environments  to  which  the  codes 
port  include  workstations,  either  used  singly  or  clustered  to 
provide  a  parallel  computing  domain,  and  also  integrated  dis¬ 
tributed  memory  Supercomputers  such  as  CRAY  T3D  and 
Intel  Hypercube  systems.  The  paper  outlines  these  technolo¬ 
gies  also. 

1  Introduction 

Aerodynamics  has  been  established  as  a  foundation  technol¬ 
ogy  for  the  design  of  aerospace  vehicles.  Good  application 
of  aerodynamics  will  lead  to  substantial  economic  benefits 
for  future  aircraft  designs.  Particularly  important  target  areas 
include  drag  reduction  to  improve  direct  operating  costs  and 
better  prediction  of  steady  and  unsteady  loads  on  aircraft  to 
overcome  structural  conservatism  at  the  time  of  freezing  the 
design.  For  the  majority  of  aircraft,  this  requires  the  partic¬ 
ular  capability  of  predicting  the  phenomena  of  shock  waves 
and  flow  separation.  TTiis  can  be  achieved  through  a  bet¬ 
ter  understanding  of  the  fluid  mechanics  of  flow  interactions 
using  either  experimental  techniques  or  computational  meth¬ 
ods.  Wind  tunnel  testing  at  simulated  conditions,  particularly 

'  Lecturer,  Department  of  Aerospace  Engineering,  University  of  Glas¬ 
gow,  Glasgow,  G12  8QQ,  UK 

^  R-ofessor,  Department  of  Aerospace  Engineering,  University  of 
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for  flutter,  for  example,  is  becoming  increasingly  expensive. 
On  the  other  hand,  with  the  rapid  developments  in  computer 
hardware  and  computational  techniques,  the  topic  of  compu¬ 
tational  fluid  dynamics  is  reaching  maturity  as  a  viable  way 
of  providing  design  solutions. 

A  reasonable  simulation  of  the  fluid  dynamics  of  high 
Reynolds’  number  can  be  obtained  by  solving  the  Reynolds’ 
averaged  Navier-Stokes  (RANS)  equations.  Increasing  com¬ 
puter  power  now  makes  the  solution  of  these  equations  feasi¬ 
ble.  The  level  of  turbulence  model  used  needs  to  be  a  compro¬ 
mise  between  a  simple  eddy  viscosity  model  such  as  Baldwin 
Lomax  and  a  more  complex  second  moment  closure  model. 
In  this  work  the  formw  is  used,  but  the  codes  are  starting  to 
use  the  more  general  k-cv  two  equation  model. 

To  satisfy  the  general  requirements  for  a  code  suited  for 
aircraft  design,  it  should  be  accurate,  efficient  and  robust  and 
usable  on  future  computer  architectures.  'Die general  approach 
chosen  by  the  University  of  Glasgow  CFD  Team  in  this  work 
is  to  use  high  order  upwind  differencing  to  provide  accu¬ 
racy  and  robustness  and  to  mostly  use  implicit  methods  to 
provide  efficiency  [5]  [9].  Unstructured  grids  are  also  being 
considered  by  the  Team  as  a  way  forward  for  dealing  with 
geometric  complexify  but  there  are  developmental  difficulties 
in  tackling  viscous  flows  near  boundaries  and  calls  for  high 
memory.  Geometric  complexity  using  structured  meshes  can 
be  accommodated  using  multi-block  grids  which  lend  them¬ 
selves  to  distributed  memory  computing  architectures  using 
a  multi-domain  approach.  The  combination  of  an  implicit  ap¬ 
proach  on  a  structured  grid  for  wall  turbulent  flows  provides 
an  efficient  code,  particularly  for  unsteady  flows. 

There  exists  a  considerable  variety  of  computer  architec¬ 
tures  from  which  to  choose.  The  general  consensus,  however, 
is  that  competitively  priced  distributed  memory  massively  par¬ 
allel  processors  (MPPs)  will  provide  the  Teraflops  facility  (or 
greater)  that  will  berequired  to  tackle  CFD  solutions  using  the 
RANS  model  for  flows  over  complete  aircraft  configurations. 
A  number  of  vendors  promise  production  of  such  Teraflops 
facilities  in  the  near  future,  although  the  cost  is  likely  to  be 
beyond  the  means  of  all  but  the  largest  organisations.  Also 
there  needs  to  be  a  further  investment  in  adapting  the  majority 
of  existing  codes  to  use  it.  There  is  a  trend  to  provide  a  similar 
architecture  at  a  much  lower  cost  using  workstation  clusters. 


Paper  presented  at  the  AGARD  FDP  Symposium  on  “Progress  and  Challenges  in  CFD  Methods  and  Algorithms 
held  in  Seville,  Spain,  from  2-5  October  1995,  and  published  in  CP-578. 
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On  the  sort  of  broad-bandwidth  networks  that  are  planned 
for  the  future  in  Corporate  networks,  the  type  of  powerful 
high  memory  workstations  that  are  being  used  for  detailed 
CAD/CAM  design  work  in  the  engineering  industry  in  the 
daytime  are  amenable  to  be  turned  loose  at  off  peak  times  to 
provide  a  powerful  high -memory  system. 

Creating  the  parallel  computing  environment  for  the  Glas¬ 
gow  CFD  Team  has  proven  to  be  an  interesting  case  history 
that  it  is  appropriate  to  relate  as  a  contribution  towards  the 
theme  of  this  conference.  Before  1990,  computer  systems 
within  the  Universities  in  the  UK  had  undergone  a  major  up¬ 
grade  each  seven  years,  funded  by  the  University  Funding 
Council,  and  this  generally  enabled  the  acquisition  of  a  useful 
multi-user  mainframe.  Numerically  intensive  computer  users 
then  had  access  to  off  peak  cycles  through  batch  facilities. 
Before  1994  at  Glasgow,  for  example,  the  sizeable  University 
central  Computing  Service  operated  a  CMS  environment  on 
an  IBM  3090 150E  vector  facility  for  scientific  work,  with  the 
help  of  a  technological  agreement  with  the  vendor,  as  weU  as 
VME  and  VMS  environments  on  sizeable  ICL  and  DEC  facil¬ 
ities,  respectively.  From  another  initiative  the  University  also 
acquired  a  32  transputer  distributed  memory  Meiko  Comput¬ 
ing  Surface,  along  with  a  systems  manager,  on  which  some 
early  experience  on  parallel  computing  was  developed  by  the 
CFD  Team  members  to  complement  time  awarded  by  peer  re¬ 
viewed  on  National  Facilities  such  as  CRAY-XMP  and  YMP 
vector  multi-processors  .  The  Team’s  work  could  be  classed 
at  this  stage  in  the  category  of  high  performance  computing 
(HPC). 

In  1994,  the  Funding  Council  support  changed  to  a  sys¬ 
tem  of  IT  support  on  an  annual  basis,  at  the  same  time  the 
University  adopted  an  FT  strategy  to  distributed  the  monies 
involved  thinly  to  aU  Departments  whilst  providing  a  core 
support  for:  the  overall  Campus  Network  including  a  FDDI 
backbone  (later  an  ATM  backbone);  and  a  UNIX  cluster  for 
core  computing  (with  a  cost  imposed  on  groups  who  used 
cycles  above  a  threshold  which  was  set  at  a  low  level).  The 
implication  of  this  University  strategy  was  the  need  for  HPC 
users  to  prise  a  proportion  of  their  Department’s  allocation  of 
funds  and  add  it  to  other  initiatives  to  secure  the  computing 
environment  that  they  needed.  Also  at  about  the  same  time 
at  National  level,  resources  targeted  for  research  (managed 
by  EPSRC)  were  used  to  purchase  a  CRAY  T3D  with  320 
DEC -Alpha  nodes  and  following  bids  this  was  placed  at  the 
Edinburgh  Parallel  Computing  Centre  at  Edinburgh  Univer¬ 
sity.  This  facility  was  designated  for  the  exclusive  use  of  a 
limited  number  of  University  Consortia  to  tackle  Grand  Chal¬ 
lenge  problems  only. 

With  this  background,  the  team  then  fronted  two  main  ini¬ 
tiatives  to  achieve  an  acceptable  computing  resource  for  its 
ambitions  to  develop  state-of-the-art  code  that  might  be  useful 
for  aircraft  designers.  The  first  was  to  develop  a  University 
Consortium  (finally,  this  included  the  Universities  of  Bristol, 
Glasgow,  Oxford  and  Swansea  and  UMIST)  that  proposed 
a  topic  on  Physically  and  Geometrically  Complex  Aerody¬ 
namic  Flows  for  Aircraft  Flight  to  use  the  Cray  T3D  facility. 


The  title  was  changed  to  the  shortened  form  Computation  of 
Complex  Aerodynamic  Flows  or  CCAF  Project  after  the  pro¬ 
posal  was  accepted  [2].  The  other  initiative  was  to  develop  a 
Consortium  of  Departments  within  the  University  to  bid  for 
resources  under  the  New  Technologies  Initiative  (NTI)  for 
the  development  of  a  High  Performance  Parallel  Computing 
facility  from  Spare  Capacity  on  a  Network  of  Workstations? 
NTI  was  developed  by  the  Joint  Information  Systems  Com¬ 
mittee  from  funds  that  the  Committee  had  secured  themselves 
from  the  Higher  Education  Funding  Councils  to  promote  pilot 
studies  towards  developing  state-of-the-art  computing  capa¬ 
bilities  across  the  Universities.  When  the  funds  were  awarded 
the  University  project  was  designated  the  HNW  Project.  These 
projects  (both  now  have  a  year’s  maturity)  give  access  to  a 
world  class  resource  to  the  CFD  Team.  These  projects  wUl 
now  be  described  separately. 

One  target  area  of  application  for  the  CCAF  project  is  to¬ 
wards  the  study  of  aeroelasticity  at  the  edges  of  the  flight 
envelope,  an  area  in  which  the  non-linearity  of  the  problem 
poses  considerable  uncertainties  and  is  likely  to  reveal  inter¬ 
esting  new  mechanics.  The  challenge  is  to  be  in  a  position  to 
complement  experimental  and  analytical  studies  of  these  com¬ 
plex  physical  phenomena  using  facilities  as  powerful  as  the 
EPCC  Cray  T3D.  Electrodynamics  radiation  is  also  included 
in  the  programme  becauseof  the  commonality  in  grids  and  so¬ 
lution  techniques  and  the  opportunity  to  widen  the  application 
base  of  the  project.  The  resource  awarded  is  modest  (around 
64,000  processorhours  per  year),  but  with  development  being 
done  on  local  computing  environments  with  production  tests 
done  on  the  National  facility,  the  resource  is  useful. 

Two  main  computational  approaches  are  being  pursued  in 
CCAF:  structured  grid  work  is  at  a  more  mature  stage,  par¬ 
allelisation  of  multi-block  codes  and  dealing  with  boundary 
layers  is  straight  forward  but  dealing  with  geometric  com¬ 
plexity  is  problematic;  unstructured  grid  work  copes  well 
with  geometric  complexity  but  partitioning  causes  problems. 
The  project  includes  comparisons  between  codes  developed 
in  order  to  determine  the  best  future  strategy,  fri  the  area  of 
aeroelasticity,  there  is  a  dearth  of  experimental  data  of  the 
quality  and  appropriateness  for  CFD  validation.  Nevertheless 
the  Consortium  has  identified  a  suitable  unsteady  test  case 
involving  the  AGARD  LANN  swept  wing  to  provide  an  ap¬ 
propriately  challenging  common  test  case.  The  Glasgow  Team 
is  involved  particularly  with  the  development  of  a  multi-block 
structured  grid  flow  code  meshed  with  a  structural  code  made 
available  from  industry  and  uses  on  average  1,600  processor 
hours  of  T3D  resource  per  month  on  this.  Some  preliminary 
results  are  reported  below. 

At  the  other  end  of  the  cost  scale,  the  HNW  cluster  project 
was  awarded  8  man  years  of  effort  by  JISC  over  a  period  of  3 
years.  The  six  collaborating  Departments  in  the  University  of 
Glasgow  provided  funds  to  purchase  equipment  and  software 
for  a  pilot  facility,  which  could  also  be  used  as  a  demonstrator 
for  a  dedicated  cluster  as  well  as  a  base  for  testing  different 


^  see  http://www.aero.gla.ac.uk/Research/HNW  for  full  details 
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cluster  technologies.  Following  a  stringent  selection  process 
and  based  on  the  company’s  strong  interest  in  the  cluster 
technology,  six  Silicon  Graphics’  hidys  with  MIPS  R4400 
processors  and  64  MB  memory  and  17  inch  monitors  were 
selected  and  purchased.  These  were  assembled  together  in 
one  laboratory  and  connected  using  grade  5  UTP  cabling  to  a 
lObaseTEthemet  switch,  which  is  the  standard  presently  used 
by  the  University,  and  this  itself  was  connected  to  the  network. 
Using  PVM  3.3  message  passing,  excellent  performance  was 
achieved  using  the  Team’s  CFD  codes,  with  little  latency  [6]. 
A  planned  upgrade  to  ATM  switching  on  the  UTP  cabling  is 
planned  in  the  near  future  to  improve  communication  speed  as 
well  as  a  multi-cluster  activity  with  an  adjoining  University 
linked  to  the  local  ATM  based  Metropolitan  Area  Network 
(MAN)  caUed  ClydeNet. 

From  other  research  projects,  ten  more  Indys  have  recently 
been  added  to  the  Departmental  lObaseT  network.  The  com¬ 
bined  resource  is  available  generally  as  a  computing  domain 
to  users  given  an  account.  Apart  from  PVM  being  installed 
on  the  cluster  as  the  message  passing  software  for  the  parallel 
implementations,  alternatives  for  users  include  MPI  (CHIMP 
and  LAM  versions)  as  well  as  Oxford  Parallel  BSP.  Clusters 
in  other  Departments  in  the  University  are  beginning  to  be  set 
up  in  a  similar  way. 

Because  of  the  heterogeneous  nature  of  the  user  base  of 
the  cluster,  a  resource  management  system  was  required  to 
optimise  use  of  these  cluster  resources.  The  public  domain 
software  NQS,  CONDOR  and  DQS  and  demonstration  ver¬ 
sions  of  the  supported  software  CODINE™  and  LSF™  were 
obtained  and  assessed.  LSF™,  written  by  Platform  Comput¬ 
ing  Inc.  of  Toronto,  had  the  best  ingredients  for  the  University 
based  project,  particularly  a  multi-cluster  capability,  and  has 
been  selected  by  a  number  of  Industries,  particularly  some 
Aerospace  Industries  as  a  means  of  managing  the  cluster  re¬ 
source.  A  University  agreement,  which  included  technolog¬ 
ical  exchanges  towards  the  future  development  of  LSF™, 
made  available  a  multi-platform  site  license  to  explore  its  use 
in  a  University  environment  and  particularly  this  presently 
unique  facility  of  managing  multi-clusters.  The  experience  to 
date  in  its  implementation  is  that  improved  load  balancing, 
and  hence  a  considerably  better  use  of  cycles  is  made  by  now 
submitting  jobs  to  the  domain,  rather  than  to  a  specific  work¬ 
station.  The  software  identifies  the  best  resource  for  a  job  and 
carries  it  out  transparently  to  the  user.  If  a  user  wishes  to  re¬ 
claim  use  of  a  machine  for  interactive  work,  the  part  of  the 
job  being  done  on  that  machine  is  automatically  checkpointed 
and  migrated  to  another  machine  with  spare  capacity.  PVM  is 
embedded  in  the  software  so  that  it  provides  an  ideal  system 
for  queuing  and  implementing  parallel  programmes  at  low 
cost.  LSF™  provides  excellent  user  interfaces,  which  help 
system  managers  of  clusters  to  improve  their  service  to  users. 

With  continued  development  of  the  cluster  technology, 
there  is  evidence  that  this  type  of  affordable  computing  could 
be  a  norm  in  design  offices  within  the  Aerospace  Industry. 
With  this  background  on  the  technology  used  at  Glasgow,  Sec¬ 
tions  2  and  3  of  the  paper,  outline  the  discretisation  method¬ 


ologies  for  the  two  and  three  dimensional  codes  and  provides 
some  new  examples  and  Section  4  discusses  the  parallel  cod¬ 
ing  methodology  used. 

2  Two-Dimensional  Method 

The  two  dimensional  thin -layer  Reynolds’  Averaged  Navier- 
Stokes  equations  in  generalised  curvilinear  co-ordinates  {^,Tj) 
with  p  normal  to  the  surface  can  be  denoted  in  non-dimensional 
conservative  form  by 

dt  dp  dp 

where  w  denotes  the  vector  of  conserved  variables,  f  the 
convective  streamwise  flux,  g  the  convective  normal  flux  and 
s  the  normal  viscous  flux. 

One  implicit  step,  updating  the  primitive  variables  V,  can 
be  written  as 


,aR 


■r?  V 


where  and  R,,  are  terms  arising  from  the  spatial  discreti¬ 
sation  in  the  ^  and  p  directions  respectively  and 


Rf 


dig  -  s) 
dp 


Rt). 


g-p— _  p" 

In  the  present  work  the  spatial  terms  are  discretised  using 
Osher’s  flux  approximation  with  MUSCL  interpolation  and 
the  Von  Albada  limiter  for  the  convective  terms  and  central 
differencing  for  the  viscous  fluxes.  The  Baldwin-Lomax  tur¬ 
bulence  model  is  employed  to  provide  a  turbulent  contribution 
to  the  viscosity  but  this  is  not  linearised  in  time  in  the  present 
work,  i.e.  turbulence  contributions  only  appear  on  the  right- 
hand-side  of  equation  (2).  This  has  been  found  not  to  degrade 
the  stability  properties  of  the  methods  examined  in  this  paper. 

The  alternating  direction  implicit  version  of  equation  (2)  is 


9w  aR;”  aw,_i,gw  , 

^dV'^^^dV  ap  dV  ^exp  0) 


where 

Rexp=  -  At(R^  +  R^). 

The  ADI  factorisation  which  appears  on  the  left  hand  side 
of  equation  (3)  has  been  widely  used  to  approximate  a  solution 
to  the  system  (2)  because  the  banded  structure  of  each  of  the 
factors  makes  it  relatively  easy  to  solve.  However,  the  solu¬ 
tion  of  the  ADI  system  is  not  an  exact  solution  of  equation  (2) 
and  in  practice  the  factorisation  error  (the  error  introduced  by 
solving  equation  (3)  rather  than  equation  (2))  leads  to  a  prac¬ 
tical  limit  on  the  time  step  and  introduces  another  source  of 
error  into  the  calculation.  This  motivates  the  use  of  a  precon¬ 
ditioned  conjugate  gradient  solution  of  the  unfactored  system . 
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Conjugate  gradient  methods  find  an  approximation  to  the 
solution  of  a  linear  system  by  minimising  a  suitable  resid¬ 
ual  error  function  in  a  finite  dimensional  space  of  potential 
solution  vectors.  Several  algorithms  are  available  including 
BiCG,  CGSTAB,  CGS  and  GMRES.  These  methods  were 
tested  in  [3]  and  it  was  concluded  that  the  choice  of  method 
is  not  as  crucial  as  the  preconditioning.  However,  the  CGS 
method  was  found  to  be  the  quickest  of  the  three  methods  that 
do  not  require  re-orthogonalisation  and  is  used  here.  CGS  has 
the  additional  advantage  that  the  transpose  of  the  matrix  on 
the  left-hand  side  of  equation  (2),  is  not  required,  reducing 
implementation  difficulties.  The  CGS  algorithm  was  derived 
in  [10]  and  is  restated  in  [12]. 

Denoting  the  linear  system  to  be  solved  at  each  time  step 
by 

y4x=b  (4) 

we  seek  an  approximation  to  A~^  ps  C~^  which  yields  a 
system 

C-‘/lx=C-'b  (5) 

more  amenable  to  conjugate  gradient  methods.  The  ADImethod 
provides  a  fast  way  of  calculating  an  approximate  solution  to 
equation  (4)  or,  restating  this,  of  forming  the  matrix  vector 
product 

C~'b=x.  (6) 

Hence,  if  we  use  the  inverse  of  the  ADI  factorisation  as  the 
preconditioner  then  multiplying  a  vector  by  the  preconditioner 
can  be  achieved  simply  by  solving  a  linear  system  with  the 
right-hand  side  given  by  themultiplicand  and  the  left  hand  side 
matrix  given  the  approximate  factorisation.  The  factors  in  C 
are  put  in  triangular  form  once  at  each  time  step  with  the  row 
operations  being  stored  for  use  at  each  multiplication  by  the 
preconditioner.  This  roughly  doubles  the  storage  requirements 
of  the  method. 

To  illustrate  the  performance  of  this  method  we  present  re¬ 
sults  for  flow  over  an  RAE2822  aerofoil  at  a  free  stream  Mach 
number  of  0.73,  an  angle  of  attack  of  2.73°  and  a  Reynolds 
number  of  6.5  x  10®.  The  comparison  of  convergence  rates 
for  the  present  method  (called  AF-CGS),  straight  ADI  and  an 
explicit  local  time-stepping  method  was  made  in  [5]  and  an 
improvement  in  time  to  convergence  of  25  per-cent  was  noted 
for  the  present  method  when  compared  with  ADI.  When  the 
fi’ee  stream  flow  is  used  as  starting  conditions,  the  CFL  num¬ 
ber  which  yields  fastest  convergence  for  the  AF-CGS  method 
is  35  but  CFL  numbers  of  up  to  50  can  be  used.  The  largest 
CFL  number  which  yields  a  solution  for  ADI  is  18  andhence, 
removing  the  factorisation  error  allows  the  use  of  larger  time 
steps.  A  further  reduction  in  the  time  to  convergence  by  a 
factor  of  five  has  been  achieved  by  mesh  sequencing.  Three 
levels  of  mesh  sequencing  were  used  to  provide  a  good  start¬ 
ing  solution  on  the  finest  mesh  (257x65).  Using  this  approach 
the  optimal  CFL  number  was  increased  to  100  and  the  overall 
time  to  converge  to  within  0.25  per  cent  of  the  fuUy  converged 
lift  value  was  reduced  by  a  factor  of  5.  The  time  to  conver¬ 
gence  as  a  function  of  CFL  number  is  plotted  in  figure  1  and 
shows  a  clear  minimum.  This  is  because  there  is  a  balance 


between  increasing  the  CFL  number  to  minimise  the  number 
of  implicit  steps  and  reducing  the  CFL  number  to  minimise 
the  number  of  CGS  steps  at  each  implicit  step.  The  compar¬ 
ison  of  the  pressure  distribution  with  experiment  for  various 
levels  of  convergence  is  shown  in  figure  2  and  shows  good 
agreement  with  experiment. 


Figure  1.  Time  to  converge  to  within  0.25  %  of  drag  as  function  of 
CFL  number  on  finest  grid 


Chord  ratio  [x/c] 

Figure  2.  pressure  distribution  at  various  levels  of  convergence  at 
a  CFL  number  of  100  on  the  257x57 grid 


A  similar  approach  has  been  used  for  unsteady  flows  over 
pitching  and  plunging  aerofoils  and  aerofoils  with  moving 
flaps  [4] .  The  main  conclusion  from  this  work  was  that  AFCGS 
does  not  allow  the  choice  of  time  step  fi'om  purely  accuracy 
considerations  because  of  the  need  to  limit  the  time  step  to 
ensure  the  reasonable  performance  of  the  linear  solver.  How¬ 
ever,  AF-CGS  does  allow  for  larger  time  steps  and  a  reduced 
computational  cost  when  compared  with  ADI.  For  one  partic¬ 
ular  case  the  stability  restriction  on  the  size  of  the  time  step 
is  a  global  CFL  number  of  1000.  The  average  CFL  number 
during  one  cycle  for  the  unfactored  method  is  around  2000 
for  the  unfactored  method  translating  into  a  saving  in  CPU 
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time  of  around  twenty-five  per-cent. 


3  Three-dimensional  Extensions 


The  extension  of  the  method  to  three-dimensions  is  compli- 
catedby  two  considerations.  First,  computer  storage  becomes 
a  limiting  factor  due  to  the  need  to  store  large  Jacobian  ma¬ 
trices.  Secondly,  the  ADI  factorisation  in  three-dimensions  is 
significantly  worse  than  in  two-dimensions,  making  its  use 
as  a  preconditioner  less  favourable.  Hiis  fact  however  means 
that  there  are  increased  gains  to  be  made  in  three  dimensions 
by  the  use  of  an  alternative  to  ADI. 

One  step  of  the  method  considered  can  be  written  as 


^dT  dV  dV^dV 


(7: 


where 

Rexp—  ~  Af(R,x  Ry  -f  R-z). 

This  two  factor  step  can  be  loosely  described  as  unfactored 
in  each  spanwise  slice  and  approximately  factored  in  the 
span  wise  direction.  A  stability  analysis  [7]  has  shown  that 
the  method  has  similar  stability  properties  to  the  two  fac¬ 
tor  ADI  method  in  two-dimensions,  representing  a  significant 
improvement  on  the  behaviour  of  the  three  factor  method  in 
three-dimensions.  The  linear  system  resulting  from  the  first 
factor  in  equation  7  has  a  more  complicated  structure  than  the 
block  pentadiagonal  systems  which  are  encountered  for  each 
factor  in  the  three  factor  method.  However,  this  sytem  can  be 
solved  using  a  direct  generalisation  of  the  method  described 
for  two  dimensions  above  i.e.  we  solve  the  system 

C~'Ax=C~^b  (8) 


by  the  CGS  method  where 


A={^+At^+At^), 
dV  dV  dT 


(9) 


and 


n  ,5w  ^dRx.dw  '  9W  .^dRy. 


b - A/(R,jc  -|-  R,y  -|-  R-z),  (1 1) 


followed  by  the  solution  of  a  block  pentadiagonal  system  for 
the  updates 


(12) 


The  two  factor  method  has  substantially  reduced  memory 
requirements  compared  with  the  fully  unfactored  method.  For 
the  third  order  spatial  discretisation  there  are  13  non-zero  5 
by  5  blocks  for  the  rows  in  the  unfactored  matrix  associated 
with  any  one  grid  cell.  This  means  that  the  number  of  floating 
point  numbers  which  must  be  stored  for  the  coefficient  matrix 
for  a  mesh  with  AT  cells  is  325A/’.  Since  Af  can  be  of  the  or¬ 
der  of  one  million  for  flows  around  basic  wings,  this  implies 
that  even  if  we  can  solve  the  linear  system  efficiently,  stor¬ 
age  requirements  will  be  a  limiting  factor.  For  the  two  factor 


method  only  the  matrix  for  one  spanwise  slice  or  one  line  in 
the  spanwise  direction  need  be  stored  at  any  one  time.  This 
has  the  effect  of  reducing  the  matrix  storage  requirements  at 
any  one  time  in  the  calculation  to  mat(225A4;,ce.l25A/;,„e) 
where  AfsUce  is  the  number  of  grid  points  in  a  spanwise  slice 
and  Afiine  is  the  number  of  grid  points  in  the  spanwise  di¬ 
rection.  Since  Afiine'^siice=-^  it  can  be  seen  that  the  storage 
requirements  have  been  reduced  substantially  (by  around  two 
orders  of  magnitude  for  the  test  case  examined  in  this  paper). 

As  a  test  case  we  shall  consider  flow  over  the  ONERA  M6 
wing  in  transonic  conditions.  The  experimential  data  for  this 
wing  is  available  in  [13]  with  several  previous  computational 
results  including  those  in  [1 1].  The  flow  problem  we  consider 
here  has  a  free  stream  Mach  number  of  0.84,  an  incidence 
of  6.06  and  a  Reynold’s  number  of  1 1  million.  For  this  case 
250  explicit  steps  were  required  before  FUN  was  used  with  a 
CFL  number  of  10.  The  residual  is  reduced  about  4  orders  of 
magnitude  from  its  initial  value.  This  was  also  observed  for 
flows  over  aerofoils  in  [8]  and  was  due  to  small  oscillations 
in  the  pressure  at  the  far  field. 

The  comparison  of  the  computed  pressure  distribution  with 
the  experimential  results  of  [13]  at  six  spanwise  slices  are 
shown  in  figure  3.  Good  agreement  is  obtain  for  the  flow  ex¬ 
cept  for  the  position  of  the  shock  and  the  very  last  station  at 
99%  span.  This  has  also  been  observed  in  [11]  for  this  test 
case.  Shock  induced  separation  occurs  after  the  strong  shock 
near  the  tip  and  the  Balwin  Lomax  model  is  known  to  be  inad¬ 
equate  for  this  phenomenon.  In  [1 1]  the  Johnson-Kingmodel 
was  also  implemented  which  significantly  improved  the  re¬ 
sults.  Figure  3  shows  that  mesh  refinement  in  the  stream  wise 
direction  has  very  little  effect  on  the  solution  apart  from  sharp¬ 
ing  the  strong  shock  near  the  tip.  However  refinement  in  the 
spanwise  direction  not  only  improves  the  resolution  of  the  tip 
of  the  C-H  grid  and  hence  the  pressure  distribution  before  the 
shock  close  to  the  tip;  but  also  the  strength  of  the  first  shock 
in  the  mid  span  region.  This  can  be  more  clearly  seen  from 
the  upper  wing  surface  pressure  contours  shown  in  figure  4. 

4  Parallel  Implementation 

A  detailed  description  of  the  parallel  implementation  of  the  2 
and  3-D  methods  can  be  found  in  [6].  In  the  present  section 
we  summarise  the  main  features  and  give  sample  results. 

The  major  obstacle  to  an  efficient  parallel  implementation 
of  the  AF-CGS  method  is  the  inherently  sequential  nature 
of  the  ADI  procedure.  This  was  overcome  in  [1]  by  using  a 
transposition  of  the  data  to  allow  complete  ADI  sweeps  to  pro¬ 
ceed  independently  on  each  processor.  We  use  this  approach 
here  although  extra  communication  is  required  for  the  present 
method  because  of  the  matrix-vector  products  required  in  the 
CGS  algorithm. 

The  computational  space  is  mapped  onto  the  nodes  by 
grouping  complete  mesh  lines  in  both  the  £  and  the  tj  direc¬ 
tions  onto  a  single  node.  Care  has  to  be  taken  to  make  sure 
that  ^  lines  on  either  side  of  the  wake  cut  are  mapped  to  the 
same  processor.  The  computation  then  falls  into  three  phases . 
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First,  the  matrix  is  generated  and  the  factors  are  put  in  up¬ 
per  triangular  form.  ITie  next  phase  is  the  multiplication  of  a 
vector  by  the  matrix  and  finally  we  have  multiplication  of  a 
vector  by  the  preconditioner  which  reduces  to  back  substiti- 
tution  on  the  triangular  factors  of  the  ADI  factorisation.  For 
each  phase  data  is  held  on  a  node  for  complete  lines  in  one 
direction  in  the  mesh  and  the  entire  computation  relating  to 
that  direction  is  completed.  The  data  is  then  communicated 
so  that  information  for  complete  lines  in  the  other  direction  is 
held  on  a  single  node  and  the  computation  for  that  direction 
proceeds. 

The  parallel  code  was  also  implemented  on  a  cluster  of 
Silicon  Graphics  Indy  workstations  at  the  University  of  Glas¬ 
gow.  The  message  passing  was  accomplished  by  using  PVM 
version  3.3.  The  comparison  of  algorithm  speeds  (time  in 
/rsec/grid  point/time  step)  on  the  SGI  cluster  is  shown  in  table 
1.  The  results  where  obtained  on  a  coarse  mesh  with  only 
2400  points  and  hence  the  loss  in  efficiency  is  quite  small 
when  this  is  considered. 


Machine 

algorithm  speed 

efficiency 

SGI  cluster  1  nodes 

958 

1.00 

SGI  cluster  6  nodes 

230 

0.69 

SGI  cluster  8  nodes 

194 

0.62 

Table  1.  algorithm  speeds  in  psecigrid  pointitime  step  on  the  SGI 
cluster. 


The  three-dimensional  algorithm  has  two  distinct  phases. 
First,  there  is  the  generation  and  solution  of  the  large  lin¬ 
ear  system  arising  from  each  spanwise  slice  of  the  mesh. 
Secondly,  there  is  the  solution  of  the  banded  linear  systems 
arising  from  the  second  factor  in  the  spanwise  direction. 

The  first  phase  is  split  between  processors  in  two  ways. 
First,  the  spanwise  sections  are  split  into  groups.  Each  group 
is  then  assigned  to  a  set  of  processors  with  each  spanwise 
slice  in  the  group  being  treated  in  a  similar  way  to  the  two 
dimensional  algorithm  described  above  by  those  processors. 
The  communication  between  the  different  groups  of  proces¬ 
sors,  each  treating  a  different  set  of  spanwise  slices,  is  simply 
that  which  would  be  required  by  an  explicit  method  so  that 
the  contributions  to  the  residual  (or  the  right-hand-side  of  the 
linear  system)  from  the  spanwise  fluxes  at  the  interfaces  be¬ 
tween  the  spanwise  groupings  can  be  evaluated.  Since  there 
is  significantly  less  communication  involved  at  this  stage  than 
is  required  to  solve  a  spanwise  slice  in  parallel,  it  is  clear  that 
the  most  efficient  partition  of  the  problem  will  arise  when  as 
large  a  number  of  spanwise  groups  as  possible  is  used.  For  a 
fixed  number  of  total  processors  this  will  reduce  the  number 
of  processors  which  operate  on  a  spanwise  section. 

The  second  phase  of  the  calculation  involves  assigning 
complete  spanwise  lines  in  the  mesh  to  single  processors. 
Again,  a  transposition  of  the  data  is  used  so  that  the  calcu¬ 
lation  involving  a  single  line  can  proceed  on  a  single  pro¬ 


cessor  without  further  communication.  Once  the  updates  are 
available  a  second  transposition  is  used  to  restore  storage  by 
spanwise  slices  for  the  next  time  step. 

The  method  has  been  implemented  in  parallel  on  a  range 
of  machines.  The  algorithm  speeds  for  the  Cray  T3D  and  the 
SGI  cluster  are  given  in  table  2  for  grids  with  140000  grid 
points  for  the  T3D  and  roughly  half  this  number  on  the  SGI 
cluster.  The  parallel  efficiencies  will  increase  when  the  grid  is 
refined,  however  a  high  parallel  efficiency  has  been  obtained 
on  128  nodes,  even  for  this  relatively  small  problem.  Excellent 
efficiency  is  obtained  on  the  SGI  cluster. 


No.  of  nodes 

Explicit  timesteps 

Implicit  timesteps 

speed 

efficiency 

speed 

efficiency 

T3D1 

417 

1.00 

1510 

1.00 

T3D16 

29.6 

0.88 

107 

0.88 

T3D32 

15.6 

0.84 

55.0 

0.86 

T3D64 

8.25 

0.79 

28.9 

0.82 

T3D  128 

4.63 

0.70 

15.7 

0.75 

SGI  1 

2372 

1.00 

SGI  6 

416 

0.95 

Table  2.  Algorithm  speeds  in  /isec/gpAs  and  parallel  efficiency  for 
the  Cray  T3D  and  SGI  cluster 


5  Conclusions 

The  programmes  that  are  providing  a  world  class  comput¬ 
ing  environment  for  the  development  of  CFD  codes  at  the 
University  of  Glasgow  were  described.  A  high  quality  access 
to  the  320  processor  EPCC  Cray  T3D  was  obtained  through 
forming  the  CCAF  consortium  on  the  problem  targeted  in  this 
report.  At  the  other  end  of  the  cost  scale,  the  development  and 
description  of  a  parallel  environment  based  on  the  spare  ca¬ 
pacity  on  workstations  mounted  on  a  quality  network  under 
the  HNW  project  was  described.  It  was  predicted  that  this  lat¬ 
ter  type  of  computing  environment  would  be  a  standard  within 
the  design  offices  of  Aerospace  Companies  in  the  future. 

An  implicit  method  for  simulating  three-dimensional  com¬ 
pressible  and  viscous  flow  developed  to  run  on  a  distributed 
memory  parallel  environment  is  outlined.  The  AF-CGS  method 
is  based  on  a  two-dimensional  approach  which  consists  of  an 
iterative  solution  of  the  linear  system  by  the  conjugate  gradient 
squared  algorithm  with  preconditioning  by  the  alternating  di¬ 
rection  implicit  factorisation.  The  FUN  (factored-unfactored) 
method  tackles  three  dimensional  flows  and  builds  on  the 
two  dimensional  method  by  factoring  the  linear  system  into 
a  factor  arising  fi'om  spanwise  slices  in  the  mesh  and  a  block 
penta-diagonal  factor  arising  from  strips  in  the  spanwise  di¬ 
rection.  The  more  complicated  factors  arising  from  the  span- 
wise  slices  are  solved  by  the  two  dimensional  method.  This 
approach  yields  a  method  which  has  similar  properties  to  the 
2-d  ADI  method,  a  situation  which  is  substantially  better  than 
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a  three  dimensional  version  of  ADI.  A  study  concerning  the 
optimisation  of  the  AFCGS  code  using  RAE  2822  Case  9  was 
carried  out.  Three  levels  of  mesh  sequencing  were  used  to 
obtain  a  starting  solution  on  a  fine  mesh  of  257  x  65.  Then 
the  optimal  CFL  number  used  was  increased  to  100,  and  the 
overall  time  to  converge  to  within  0.25  per  cent  of  the  fuUy 
converged  lift  value  was  reduced  by  a  factor  of  5.  When  ap¬ 
plied  to  unsteady  flows  AFCGS  was  shown  to  allow  for  larger 
time  steps  and  a  reduced  computational  cost  when  compered 
to  ADI. 

The  FUN  code  was  tested  through  the  prediction  of  the 
flows  over  the  ONERA  M6  Wing  using  the  Cray  T3D.  Even 
for  the  relatively  course  grid  tested  parallel  efficiencies  of  75 
per  cent  were  achieved  using  128nodes.  Improved  efficiencies 
will  be  achievable  using  finer  grids.  The  comparisons  with  the 
experiment  using  a  Baldwin-Lomax  turbulence  model  were 
found  to  be  satisfactory,  but  improvements  are  expected  when 
the  k-w  turbulence  model  is  implemented. 

Future  work  includes  the  development  of  multi-block  ap¬ 
proach  and  the  testing  of  the  unsteady  3-d  code  and  its  cou¬ 
pling  with  a  structural  code  to  tackle  aeroelasticity  cases. 
Work  is  underway  on  multiblock  extensions  of  the  methodol¬ 
ogy  presented. 
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PROGRESS  AND  CHALLENGES  IN  CFD  METHODS  AND  ALGORITHMS 

GENERAL  DISCUSSION 

J.W.  Slooff.  NLR.  Netherlands 

After  Dr.  Kroll  has  given  his  opening  remarks  we  will  open  up  to  the 
floor  and  try  to  get  a,  hopefully,  lively  discussion  on  various 
issues  and  aspects  that  we  may  wish  to  address.  But  first.  Dr. 

Kroll,  please  give  us  your  on-the-spot  evaluation. 

N.  Kroll.  DLR.  Germany 

Thank  you  for  the  invitation.  I  appreciate  that  I  can  act  as  the 
evaluator  for  the  CFD  Symposium,  Before  I  go  into  details,  I  would 
like  to  mention  that  this  evaluation  reflects  my  personal  thoughts 
and  is  based  mainly  on  the  oral  presentations.  Only  a  few  papers 
reached  me  before  the  Conference,  so  I  did  not  get  the  time  to  go 
into  the  details  of  the  papers.  Therefore,  in  the  written  version 
some  of  my  statements  may  be  revised,  but  I  think  the  essential 
messages  will  not  change. 

The  background  of  this  Symposium  is  the  fact  that  CFD,  as  we  all 
know,  is  widely  accepted  as  a  key  tool  for  aerodynamic  design. 
However,  on  the  other  hand,  we  also  know  that  CFD  still  has 
deficiencies  in  accuracy,  complexity,  robustness,  and  efficiency. 

Due  to  this,  in  industry  CFD  is  not  yet  being  exploited  as 
effectively  as  one  would  expect.  Therefore,  this  Symposium  has  been 
set  up  with  the  aim  to  present  and  discuss  those  topics  which  are 
considered  as  likely  to  constitute  pacing  items  and  new  challenges  in 
CFD.  The  work  presented  here  will  be  evaluated  against  the  ambitious 
theme  of  the  Conference.  From  my  point  of  view  and  from  what  I  saw 
in  the  invited  papers,  for  the  aeronautical  industry  CFD  is  expected 
to  deliver:  1.  detailed  viscous  flow  analysis  for  complex  geometries 
at  high  Reynolds  numbers,  2.  accurate  prediction  of  aerodynnamic 
data,  3.  fast  turnaround  calculations  at  acceptable  costs,  4. 
aerodynamic  design  and  optimization  of  aircraft  components  or 
complete  aircraft  and  5.  interdisciplinary  analysis.  There  may  be 
many  other  key  problems,  but  in  my  opinion,  these  are  among  the  most 
important  ones  in  order  to  raise  the  confidence  level  of  CFD  in  the 
aeronautical  industry. 

With  respect  to  the  scope  of  the  Conference,  I  expected  contributions 
to  the  following  topics:  improvement  of  basic  algorithms  including 
space  discretization,  time  integration  and  fast  iterative  methods; 
advanced  technigues  to  treat  complex  configurations  including 
blockstructured  methods,  unstructured  and  hybrid  grids  as  well  as 
Cartesian  and  Chimera  techniques;  adaptive  methods;  parallel 
computing;  effective  algorithms  for  more  complex  applications  such  as 
turbulent  flows,  chemically  reacting  flows  and  unsteady  flows;  design 
and  optimization  methods;  effective  methods  for  miltidiscipline 
physics . 

Three  invited  and  34  technical  papers  were  presented.  First  I  would 
like  to  make  some  general  remarks  on  the  technical  quality  of  the 
papers.  Many  of  the  papers  were  of  high  quality  because  they 
represented  the  current  status  of  the  CFD  community  and  they 
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identified  or  presented  new  important  directions  of  algorithmic 
development  in  CFD.  But  in  my  opinion,  also  many  papers  of  lower 
quality  were  delivered  which  either  were  not  within  the  scope  of  this 
Symposium  or  did  not  represent  the  current  status  and  progress  of 
CFD.  Moreover,  some  of  them  reinvented  well-established  knowledge  in 
CFD.  The  Symposium  covered  eight  major  topics  as  shown  in  this  vu- 
graph . 

Although  many  papers  addressed  several  subjects,  I  categorized  them 
according  to  their  central  focus.  The  result  of  this  classification 
is  somewhat  different  from  the  session  grouping  which  was  set  up  by 
the  Program  Committee.  There  were  10  or  11  papers  on  advanced 
discretization  schemes,  only  three  papers  on  fast  implicit  iterative 
solvers  and  a  bunch  of  papers  on  parallelization.  Several  papers 
were  given  on  unstructured  meshes,  overlapping  grids  and  hybrid 
grids.  This  morning  we  heard  papers  on  adaptive  grids  and  two  papers 
concerning  specific  algorithmic  aspects  on  chemically  reacting  flows. 
We  had  some  papers  on  DNS/LES  and  on  unsteady  flows.  In  the 
following,  I  would  like  to  go  through  each  subject  and  to  make  some 
comments  on  what  was  presented  and  whether  the  major  challenges  were 
addressed  by  the  papers. 

First  let  me  say  a  few  words  about  the  invited  papers.  The  first 
paper  was  presented  by  Anthony  Jameson.  He  gave  an  excellent 
overview  about  the  present  status,  challenges  and  future  development 
of  CFD.  He  identified  some  important  challenges,  in  particular  the 
3-D  viscous  flow  simulation  for  high  Reynolds  numbers.  He  mentioned 
that  about  8  to  10  million  points  are  needed  in  order  to  accurately 
resolve  turbulent  flows  and  to  predict  the  drag  coefficient  within 
one  count.  His  presentation  on  the  unified  theory  for  1-D  shock 
capturing  methods  was  very  interesting.  He  showed  that  unifying 
different  schemes  may  help  in  designing  new  improved  methods  such  as 
his  newly  developed  CUSP  or  HCUSP  scheme.  Jameson  also  addressed  the 
important  topic  of  aerodynamic  design  and  optimization. 

The  second  invited  paper  was  delivered  by  Paul  Rubbert.  He  talked 
about  CFD  research  in  the  changing  U.S.  aeronautical  industry.  I 
think  it  was  a  very  interesting  paper  because  he  identified  the 
challenges  which  are  beyond  the  technical  ones.  He  made  an  analysis 
of  the  process  by  which  CFD  capabilities  are  created.  He  mentioned 
that  research  can  be  improved  by  introducing  new  principles  like 
customer  focus  and  customer  satisfaction.  From  the  technical 
reviewer's  point  of  view,  some  comments  and  statements  of  the 
aeronautical  industry  on  the  status  of  CFD  and  future  requirements 
would  have  been  desirable. 

The  third  paper  was  presented  by  Doyle  Knight.  He  gave  a  nice 
overview  on  parallel  computing.  Since  he  explained  the  terminology 
used  in  parallel  computing,  he  formed  the  basis  for  the  audience  to 
follow  the  technical  papers  on  parallelization.  He  discussed 
important  issues,  and  he  gave  several  examples  of  experience  with 
parallel  computing  in  the  aerospace  industry. 

From  my  point  of  view,  the  Program  Committee  did  a  good  job 
concerning  the  selection  of  the  keynote  papers.  In  my  opinion. 


GD-3 


however,  an  invited  paper  on  status  and  progress  on  grid  generation 
for  complex  configurations  was  missing.  Although  grid  generation  was 
not  a  subject  of  this  Symposium,  I  think  that  such  a  paper  would  have 
been  very  helpful  for  the  assessment  of  structured  and  unstructured 
methods.  The  issues  of  turnaround  time  and  accuracy  of  a  numerical 
method  very  often  depend  on  the  capability  of  the  available  grid 
generation  procedure. 

Now  let  me  come  to  the  specific  subjects.  Several  papers  on  parallel 
computing  were  presented.  It  is  obvious  that  routine  use  of  CFD  and 
future  large  applications  in  aeronautics  require  parallel  computing. 
The  papers  addressed  several  important  issues  such  as  parallelization 
strategies,  portability,  performance  and  load  balancing.  Some  papers 
were  devoted  to  the  adjustment  of  algorithms  designed  for  sequential 
computer  to  parallel  architectures.  The  issue  of  scalability  was 
only  barely  addressed  although  it  is  one  of  the  key  features  for 
efficiently  exploiting  parallel  computing.  Only  a  few  3-D 
applications  on  parallel  computers  have  been  presented.  The 
efficient  use  of  parallel  architectures  for  3-D  complex  industrial 
configurations  seems  to  be  still  a  major  challenge.  The  reason  for 
this  may  be  the  problem  of  load  balancing.  In  the  case  of 
unstructured  meshes ,  much  work  has  been  done  with  respect  to  domain 
partioning  and  some  public  domain  software  is  already  available. 
However,  for  structured  meshes  the  load  balanced  partioning  is  much 
more  complicated  mainly  due  to  the  geometrical  restrictions. 
Furthermore,  I  believe  that  the  adjustment  of  sequential  algorithms 
to  parallel  computers  is  not  sufficient.  New  parallel  algorithms 
have  to  be  developed.  As  an  example,  it  is  well  known  that  the 
multigrid  method  is  not  fully  scalable,  and  therefore  may  not  exploit 
the  full  performance  of  massively  parallel  computers.  In  conclusion, 
challenges  for  parallel  computing  for  the  near  future  are  scalable 
implementations  of  3-D  applications,  load  balancing  for  structured 
mesh  calculations  and  development  of  new  parallel  algorithms.  Future 
work  should  address  these  issues . 

Now  let  me  discuss  the  topic  of  advanced  space  discretization. 

Various  promising  schemes  have  been  presented  showing  some 
improvements  over  conventional  methods.  For  example,  papers  were 
devoted  to  quadratic  reconstruction  with  flux-limiters,  improved 
flux-splitting  schemes,  multidimensional  upwinding  and  kinetic  flux¬ 
splitting.  My  criticism  here  is  that  in  some  of  these  papers,  the 
assessment  of  the  new  algorithms  were  restricted  to  only  one  or  two 
aspects  of  spatial  discretization.  Designing  new  discretization 
schemes,  several  different  aspects  have  to  be  addressed,  including 
high  resolution  of  viscous  shear  layers,  sharp  shock  resolution, 
conservation,  robustness  at  shocks  and  in  expansion  regions,  overall 
efficiency  and  compactness  of  the  stencil.  In  my  opinion,  a  detailed 
comparison  of  available  schemes  covering  all  these  issues  is  needed 
in  order  to  assess  potentials  and  limitations  of  advanced  algorithms. 
I  also  would  like  to  mention  that  very  often  a  detailed  accuracy 
assessment  of  new  or  modified  methods  is  not  carried  out.  An 
assessment  study  should  include  investigations  with  respect  to  grid 
refinement  and  other  important  numerical  sensitivities.  Well- 
established  test  cases  for  the  Euler  and  Navier-Stokes  equations 
should  be  calculated  to  raise  the  confidence  level  of  the  new 
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techniques.  Of  course,  the  advanced  methods  have  to  be  applied  to 
those  problems  for  which  the  standard  schemes  show  substantial 
deficiencies.  Multidimensional  upwinding,  from  my  point  of  view, 
made  large  progress  in  the  last  few  years,  but  I  still  think  that 
these  schemes  -  and  also  kinetic  algorithms  -  are  not  yet  at  the 
stage  to  be  used  in  a  3-D  production  code.  The  major  challenge  for 
advanced  discretization  schemes  is  the  accurate  calculation  of  3-D 
viscous  flows.  Beside  high  resolution  schemes,  improved  turbulence 
models  are  required.  Turbulence  modelling,  however,  was  not  a  topic 
of  this  Symposium. 

Let  me  come  to  the  third  subject  on  fast  implicit  and  iterative 
methods.  Here  good  papers  on  Newton-Krylov  subspace  methods  were 
presented.  For  standard  cases,  like  2-D  inviscid  flows  or  viscous 
flows  with  moderate  Reynolds  numbers,  the  more  sophisticated  methods 
such  as  multigrid,  Newton-Krylov  subspace  methods  and  advanced 
implicit  schemes  perform  almost  equally  well.  From  the  literature, 
it  is  obvious  that  multigrid  is  mostly  used  with  structured  meshes, 
whereas  for  unstructured  grids,  very  often  Newton's  method  with 
Krylov  subspace  iteration  is  applied.  For  3-D  flows  around  complex 
configurations  the  situation  is  not  clear.  We  do  know  that  for 
generic  configurations  and  moderate  Reynolds  numbers  multigrid  is 
quite  efficient.  There  is  not  much  known  about  the  Newton-Krylov 
methods.  Open  questions  are  the  subspace  dimension,  the  memory 
requirements  and  the  computational  costs.  Much  more  effort  is 
required  to  explore  the  capabilities  and  limitations  of  these 
methods.  In  my  view,  the  real  challenge  concerning  the  development 
of  efficient  time  integration  algorithms  was  not  addressed  here.  It 
is  the  simulation  of  realistic  Reynolds  number  flows  in  2-p  and  3-D. 
Due  to  efficiency  reasons,  for  these  flows  high  aspect  ratio  cells 
are  required.  Due  to  the  lack  of  efficient  smoothers,  the 
convergence  behavior  of  the  multigrid  method  gets  worse.  In  the  case 
of  Krylov  subspace  methods ,  a  suitable  preconditioner  has  to  be 
designed.  Future  work  should  be  devoted  to  the  development  of 
efficient  time  integration  algorithms  for  stiff  discrete  equations 
due  to  high  aspect  ratio  cells. 

There  was  a  nice  paper  on  time-preconditioning,  which  I  think  is  a 
very  interesting  approach  to  achieve  Mach  number  independent 
convergence.  A  key  concern  is  the  development  of  a  unified  flow 
solver  covering  incompressible  flows  up  to  hypersonic  flows.  I  do 
not  think  that  the  technique  is  already  mature,  however, 
preconditioning  is  a  good  candidate  to  reach  that  goal. 

Another  subject  dealt  with  unstructured  and  hybrid  methods.  The 
paper  from  Rockwell  stated  that  unstructured  grids  are  well  suited 
for  inviscid  flows  including  flows  around  complex  3— D  configurations. 
On  the  other  hand,  experience  shows  that  for  accurate  viscous 
calculation,  some  kind  of  regular  cells  are  required  in  the  boundary 
layer.  So  the  approach  of  hybrid  grids  may  be  a  good  choice  because 
it  combines  all  the  advantages  of  structured  and  unstructured  meshes 
and  offers  the  possibility  for  an  automatic  simulation  of  3-D  complex 
configurations.  The  work  presented  here  on  hybrid  meshes  is  in  an 
early  stage  and  substantial  effort  is  required  to  establish  a 
valuable  tool  for  complex  viscous  applications.  For  the  simulation 
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of  configurations  with  moving  bodies,  the  overlapping  grid  technique 
seems  to  be  very  interesting.  The  meshless  technique  approach  is  an 
interesting  idea,  but  it  shows  many  deficiencies.  For  example, 
conservation  is  not  guaranteed  and  the  control  of  accuracy  is  quite 
difficult.  Much  work  is  required  to  get  some  confidence  in  this 
approach.  For  all  discretization  strategies,  considerable  effort  is 
still  required  to  significantly  reduce  the  turnaround  time  for 
viscous  simulations  for  complex  3-D  configurations.  Some  promising 
results  have  been  presented  here. 

The  next  subject  covers  the  activities  on  adaptive  methods.  It  is 
well  known  that  adaption  is  an  important  issue  for  cost-effective 
calculations.  Various  strategies  have  been  presented  including  mesh 
movement  and  mesh  refinement  for  both  structured  and  unstructured 
grids.  In  my  opinion,  there  are  several  open  questions,  some  of  them 
were  addressed  at  the  Conference.  A  key  issue  is  the  selection  of 
suitable  criteria  for  grid  adaption.  As  proposed  by  several  papers, 
finite  element  error  indicators  seem  to  be  the  right  choice.  They 
ensure  that  the  solution  will  not  be  sensitive  to  the  adaption 
pattern.  However,  so  far  in  most  applications  local  flow  gradients 
are  used  as  sensors.  In  these  cases,  the  estimation  of  the  overall 
accuracy  is  quite  difficult  and  a  grid  independent  solution  may  not 
be  obtained.  With  respect  to  parallel  computing,  the  problem  of 
dynamical  load  balancing  occurs,  especially  in  the  case  of  structured 
meshes.  The  challenge  of  adaptive  methods  is  the  application  to  3-D 
viscous  flow  fields.  As  mentioned  here  by  several  authors, 
considerable  work  is  required  to  extend  error  based  indicators  to 
viscous  flows. 

Concerning  unsteady  flows,  several  papers  presented  time  accurate 
calculations  for  incompressible  and  compressible  flows.  Some 
attempts  were  made  to  cut  the  cost  of  time  accurate  calculations. 
However,  it  is  obvious  that  new  innovative  concepts  have  to  be 
developed  in  order  to  efficiently  simulate  3-D  viscous  unsteady 
flows . 

A  few  papers  addressed  LES  and  DNS.  At  the  moment  both  simulation 
techniques  focus  on  fundmental  research  of  flow  physics,  especially 
turbulence.  Specific  requirements  on  LES  and  DNS  solvers  were 
discussed  including  high  resolution  in  time  and  space,  adaptive  grids 
and  parallel  computing.  Based  on  these  sophisticated  methods,  one 
paper  held  out  a  prospect  of  large  eddy  simulation  of  a  clean  wing  at 
moderate  Reynolds  number  in  the  near  future.  In  my  opinion, 
significant  research  work  on  both  algorithms  and  subgrid  model  is 
required  to  enable  this  simulation.  Some  of  the  papers  dealing  with 
this  subject  did  not  meet  the  scope  of  this  Symposium. 

The  topic  of  chemically  reacting  flows  was  covered  only  by  two 
papers.  Both  presented  modifications  and  improvements  of  numerical 
methods  to  meet  the  requirements  of  hypersonic  reacting  flows, 
namely,  sharp  capturing  of  strong  shocks,  high  resolution  of  viscous 
regions,  robustness  in  regions  of  flow  expansion  and  efficient 
solution  of  stiff  equations.  Promising  results  for  2-D  and  3-D  flows 
were  presented. 
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This  concludes  my  technical  comments  on  the  various  subjects.  Now  I 
would  like  to  give  a  few  concluding  remarks.  Measured  against  the 
theme  of  the  Symposium,  in  my  opinion,  many  papers  of  high  quality 
but  also  many  papers  of  lower  quality  were  given.  Concerning  the 
technical  standard,  there  was  quite  some  difference  between  this 
Symposium  and  other  conferences  such  as  the  AIAA  conferences  in  the 
U.S.  or  the  ECOMASS  in  Europe.  In  order  to  improve  the  quality  of 
the  papers,  one  should  ask  for  extended  abstracts.  CFD  has  so  many 
aspects  and  facets  that  it  is  very  difficult  to  assess  the  quality  of 
a  paper  with  only  a  few  pages  of  abstract.  Furthermore,  one  should 
define  some  criteria  or  certain  procedures  which  have  to  be  met  by 
the  abstracts.  This  could  include  the  calculation  of  specific  test 
cases . 

Nevertheless,  I  would  like  to  say  that  all  in  all  the  Symposium  was 
interesting.  We  saw  some  recent  developments  and  achievements  in  C 
which  I  have  mentioned  before.  Several  problems  were  identified, 
being  pacing  items  for  algorithmic  improvements  and  new  developments. 
However,  from  my  personal  point  of  view,  the  Symposium  did  not 
reflect  the  actual  status  of  the  CFD  community  compared  to  other  CFD 
conferences.  Many  leading  experts  were  not  present.  Especially, 
there  was  only  a  small  contribution  by  the  U.S.  Many  aspects  and 
recent  developments  were  not  addressed  in  this  Conference. 
Furthermore,  in  some  areas,  I  think  CFD  is  much  more  developed  than 
it  was  presented  here.  I  would  like  to  say  a  few  words  concerning 
the  challenges  I  have  mentioned  previously.  Many  of  the  key  issues 
important  for  industry  were  not  covered  here.  No  paper  tackled  the 
problem  of  high  Reynolds  number  flows.  There  were  not  many  papers  on 
accurate  drag  calculation  for  viscous  turbulent  flows.  Concerning 
the  problem  of  short  turnaround  time  for  complex  configurations,  some 
advanced  approaches  including  unstructured  and  hybrid  methods  were 
presented.  However,  no  paper  addressed  3-D  viscous  calculations  ^ 
around  more  complex  geometries.  There  was  only  one  paper  -  Jameson  s 
invited  lecture  -  dealing  with  design  optimization.  No  paper 
addressed  interdisciplinary  methods,  an  issue  which  is  definietly  a 
future  challenge  in  CFD. 

Finally,  I  would  like  to  remark  that,  in  my  opinion,  the  scope  of  the 
Symposium  was  too  encompassing.  It  is  almost  impossible  to  cover  al 
important  new  directions  in  CFD  within  an  AGARD  conference  of  three 
and  one-half  days.  It  would  have  been  better  to  restrict  the 
Symposium  to  some  specific  subjects,  say  adaptive  methods.  ^ 

case  a  comprehensive  overview  and  review  of  this  particular  subject 
would  have  been  possible.  New  directions  and  developments  and  their 
critical  assessment  could  have  been  addressed  in  more  detail. 


j.w.  Slooff.  NLR.  Netherlands  . 

Thank  you  very  much  Dr.  Kroll  for  what  I  think  was  a  yery  appropriate 
and  to-the-point  evaluation  with  a  good  balance  of  critique  an 
praise.  One  small  remark  from  my  side,  I  think  that  there  is  an 
internal  conflict  between  two  of  your  statements  in  the  sense  that  on 
the  one  hand  you  think  not  enough  subjects  of  CFD  were  covered  and  on 
the  other  hand,  you  said  the  scope  was  too  wide  for  only  three  and 
one  half  days. 


N.  Kroll.  DLR.  Germany 

Measured  against  the  scope  and  aim  of  this  Symposium,  not  all 
important  issues  and  subjects  were  covered. 
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J.W.  Slooff.  NLR.  Netherlands 

We  are  now  at  the  Open  Discussion  part  of  the  final  session.  The 
last  thing  I  would  like  to  do  is  to  put  the  discussion  into  some  sort 
of  a  straight  jacket,  but  on  the  other  hand,  it  occurred  to  me  that 
if  we  go  about  this  in  a  completely  unstructured  way,  the  discussion 
might  quite  easily  become  pointless.  What  I  suggest  is  that  we 
proceed  in  the  following  way.  First  of  all,  I  would  like  to  spend  5 
or  10  minutes  to  give  direct  comments  on  some  of  the  statements  made 
by  the  Technical  Evaluator.  Some  of  you  may  have  the  urge  to  do  so. 
After  that,  I  suggest  that  we  try  to  look  at  things  from  some 
distance  in  order  to  get  a  better  perspective  of  what  we  are  doing 
and  what  we  are  doing  it  for.  In  doing  so,  I  think,  we  should  add  a 
background  as  a  sort  of  framework  against  which  we  can  project  our 
comments,  remarks  and  suggestions.  On  this  background  we  should  keep 
questions  in  mind  like:  what  are  industry's  requirements?  what  do  we 
have  to  offer  as  the  CFD  research  community,  what  kind  of  new 
developments? ,  which  of  these  developments  have  the  best  prospects 
for  better  meeting  industry's  requirements. 

Before  we  start  the  actual  discussion,  I  have  to  point  out  a  few 
administrative  things.  This  Round  Table  Discussion  is  being  recorded 
and  a  transcript  of  the  tape  will  be  made.  That  does  not  mean  that 
you  should  confine  yourself.  You  don't  have  to  be  afraid  of  saying 
things  that  you  will  be  confronted  with  afterwards.  You  will  be  sent 
a  copy  of  the  transcript  of  the  tape  and  you  will  have  the 
opportunity  to  edit  your  comments  and  remarks.  In  order  to  be  able 
to  do  so,  it  is  necessary  that  you  clearly  state  your  name  and 
affiliation  so  that  we  know  who  spoke  and  who  to  send  the  transcript 
to.  I  will  come  back  later  on  some  of  the  basic  questions  and 
provide  a  little  bit  more  framework  for  our  discussion.  But  first, 
who  would  like  to  give  some  direct  comments  on  some  of  the  remarks 
that  the  Technical  Evaluator  gave  us  just  a  few  minutes  ago?  I 
imagine  the  Program  Committee  Chairman  might  have  to  say  something. 

J.A.  Essers .  University  of  Liege.  Belgium 

First  of  all  I  think  that  Dr.  Kroll  did  a  very  good  job,  and  perhaps 
you  will  be  surprised  to  note  that  I  agree  with  almost  everything  he 
said.  But  anyway,  I  have  a  technical  remark  to  make  on  one  point. 
This  point  is  concerning  the  fact  that  it  is  better  to  use  a 
structured  grids  on  viscous  flows,  and  perhaps  an  unstructured  grid 
is  good  for  inviscid  flows.  I  disagree  with  that.  Many  people 
believe  that  with  unstructured  grids,  you  cannot  compute  accurately  a 
boundary  layer  or  other  shear  layers.  That  is  wrong.  We  made  some 
calculations  with  quadrilateral  and  triangular  grids  that  were 
extremely  irregular  and  got  very  good  accuracy,  with  for  example, 
parabolic  re-construction  techniques.  My  feeling  is  that  many  people 
use  discretizations  which  are  not  accurate  enough  for  the  viscous 
terms  when  the  grid  is  distorted,  but  if  you  use  schemes  which  are 
weakly  sensitive  to  mesh  distortions,  it  can  work  fine.  Anyway,  I 
must  confess  that  you  usually  need  more  points  in  a  boundary  layer  if 
you  use  unstructured  grids  then  if  you  use  a  structured  grid,  so  it 
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is  anyway  worthwhile  to  use  the  hybrid  grids  for  that  reason.  Now  I 
would  like  to  make  another  comment  concerning  Dr .  Kroll's  remarks.  I 
said  that  I  essentially  agree  with  what  he  said,  but  he  should  be 
aware  of  the  fact  that  the  Program  Committee  of  an  AGARD  meeting  has 
constraints  that  organizers  of  large  conferences  don't  have.  For 
example,  we  are  almost  not  allowed  to  have  parallel  sessions.  And 
parallel  sessions  would  have  been  necessary.  I  propossed  parallel 
sessions,  but  I  had  immediately  10  opponents  in  the  group,  so  it  was 
impossible  to  make  it.  That  is  namely,  because  we  need  a  Technical 
Evaluator  who  cannot  attend  all  sessions  at  the  same  time  of  course. 
Something  else,  also,  is  that  we  have  some,  let  us  say,  political 
constraints,  in  the  sense  that  it  is  important  to  AGARD  that  all  NATO 
countries  can  participate  in  such  conferences,  to  present  the  status 
of  the  research  in  their  country,  and  that  is  a  constraint  other 
conferences  don't  have. 

B.  Masure.  STREHNA.  France 

The  Technical  Evaluator  said  that  many  papers  did  not  address  the 
problem  of  the  accuracy  assessment.  I  ask  the  Technical  Evaluator  to 
say  to  us  what  is  exactly  an  accuracy  assessment  for  a  code. 


N.  Kroll.  DLR.  Germany  . 

For  structured  meshes,  one  should  make  sure  that  by  refining  the  gri 
the  results  will  become  independent  from  the  grid.  Furthermore,  the 
order  of  accuracy  claimed  by  theory  can  be  checked  by  grid  refinemen 
studies.  In  case  of  unstructured  meshes  the  accuracy  assessment  may 
be  more  difficult,  but  I  think  we  can  borrow  some  techniques  from 
finite  element  theory. 

P.W.  Sacher.  DASA.  Germany 

Six  years  ago  we  had  a  big  Symposium  on  the  subject  of  Code 
Validation,  CFD  Validation.  This  was  a  subject  that  I  missed  here, 
specifically  in  your  remarks,  in  your  evaluation.  Does  it  mean  that 
code  validation  is  no  longer  an  issue  for  CFD? 

J.M.  Slooff.  NLR.  Netherlands  •  ■  -n 

I  am  pretty  sure  that  Dr.  Kroll  is  going  to  say  that  it  is  still  an 
issue,  but  that  it  was,  on  purpose,  outside  of  the  scope  of  this 
Conference.  Are  there  any  further  direct  comments  on  the  Technical 
Evaluator's  remarks? 

F.  Mokhtarian.  Canadair,  Canada  ^  4.v. 

I  just  wanted  to  say  that  I  agree  with  most  of  the  comments  of  the 
Technical  Evaluator  regarding  the  quality  of  the  Conference  and  the 
papers.  However,  the  comment  I  would  like  to  make  is  that  I  thought 
his  comparison  of  the  Conference  with  some  of  the  other  conferences 
was  perhaps  a  bit  unfair.  This  isn't  the  first  time  I  have  attended 
an  AGARD  Conference.  I  have  been  to  many  other  conferences  in  Canada 
and  the  U.S.  and  you  always  get  a  variety  of  papers,  you  can't  always 
be  very  critical.  It  is  very  difficult  to  tell  the  quality  of  some 
of  the  papers  ahead  of  time.  There  were  some  papers  I  thought 
perhaps  were  not  exactly  up  to  par,  but  there  were  lots  and  lots  of 
papers  that  were  very  high  quality  and  I  was  glad  I  was  able  to 
attend.  The  only  comment  I  wanted  to  make  is  that  I  think  he  was  a 


bit  harsh  comparing  this  Conference  with  some  of  the  other 
Conferences. 
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N.  Krollf  DLR.  Germany 

Essentially,  I  think  you  are  right,  but  you  have  to  read  again  the 
title  of  the  Conference.  The  Conference  has  the  title,  "Progress  and 
Challenges  of  CFD  Methods".  This  is  an  ambitious  title.  You  have  to 
make  sure  that  most  of  the  papers  of  the  Conference  will  meet  the 
high  demands  of  the  Symposium. 

J.  van  Inaen.  Delft  University.  Netherlands 

In  regard  to  the  remark  by  Prof.  Essers  about  the  boundary  conditions 
on  an  AGARD  meeting  I  think  we  as  a  Panel  have  to  think  about  what 
especially  is  the  place  of  AGARD  in  all  of  these  CFD  conferences.  I 
think  CFD  conferences  organized  by  AGARD  should  have  some  specific 
ideas  in  it  to  bring  together  the  users  and  developers  and  maybe  we 
should  leave  the  more  fundamental  subjects  to  a  special  conference  on 
that.  The  present  criticism  may  be  due  to  the  boundary  conditions  we 
have  to  put  on  AGARD  conferences. 

J.A.  Essers.  University  of  Liege,  Belgium 

Just  to  see  if  we  agree,  let  me  just  make  a  few  comments  and  ask  a 
question  to  Dr.  Kroll.  I  think  that,  as  you  said,  there  were  very 
good  papers  in  this  Conference,  there  were  some  that  were  not  as 
good,  of  course.  Unfortunately,  very  few  papers  here  had  a 
sufficient  vision  of  the  future.  But  that  remark  is  also  valid  for 
many  conferences.  Nevertheless,  I  agree  that  the  title  of  the 
conference  was  perhaps  too  ambitious,  but  we  couldn't  know  that 
before  receiving  the  submitted  abstracts.  I  also  would  have  liked  to 
hear  more  people  saying  why  they  use  this  technique  instead  of 
another;  why  it  is  more  appropriate  because  in  the  future  they  want 
to  tackle  that  problem  and  that  problem.  For  example,  I  would  have 
liked  somebody  making  an  Euler  calculation,  explain  that  he  uses  that 
scheme  instead  of  another  one  because  he  thinks  that  that  scheme  will 
be  better  for  future  viscous  flow  calculations.  Nobody  discussed 
such  issues.  Is  it  perhaps  what  you  want  to  say  Dr.  Kroll? 

N.  Kroll.  DLR.  Germany 

This  is  exactly  what  I  wanted  to  say.  I  tried  to  define  challenges 
on  the  different  subjects  which  should  be  addressed  by  the  CFD 
community . 

A.G.  Panaras .  HAF  Academy.  Greece 

I  think  that  it  should  be  more  appropriate  to  state  that  progress  has 
been  reported  on  some  new  ideas  and  not  to  make  the  distinction 
between  good  and  not  good  papers .  Many  authors  have  made  substantial 
efforts  in  preparing  their  work  and  certainly  there  is  always 
something  new  that  comes  in  a  conference  like  this. 

J.W.  Slooff.  NLR.  Netherlands 

Now  with  your  permission  I  would  like  to  switch  to  the  second  part  of 
this  discussion.  I  would  like  to  get  you  thinking,  if  not  talking, 
about  three  key  questions  that  we  have  to  deal  with  and  that  we  have 
to  get  answers  for.  To  trigger  the  discussion  a  bit  further  and  to 
perhaps  provoke  you  a  little  bit  into  making  comments,  let  me  try  to 


GD-10 


list  what  I  think  are  industry's  three  most  important  requirements. 
One  is  to  increase  the  confidence  level  of  the  codes,  which  means, 
from  my  point  of  view,  that  for  every  application  you  would  like  to 
know  what  the  accuracy  is.  I  don't  think  there  are  many  codes  that, 
together  with  the  CP  distributions  and  what  have  you,  provide  an 
estimate  of  the  accuracy.  I  think  that  is  something  that  we  should 
strive  for.  Robustness,  that  is  clear,  is  another  aspect.  Reduction 
of  the  problem  turnaround  time  has  also  been  mentioned  by  Dr.  Kroll, 
of  course.  Here  grid  generation  is  the  bottleneck,  in  particular  for 
the  structured  grid  approach.  That  leads  us  to  efficiency.  We  may 
loosely  define  efficiency  as  accuracy  divided  by  cost,  and  cost  is 
more  or  less  proportional  to  time.  We  can  distinguish  between 
manpower  time  needed,  particularly  for  preprocessing,  that  is 
geometry  handling  and  grid  generation  and  the  pure  computer  time. 

I  don't  think  the  post  processing  part  of  it  is  a  big  deal  here.  If 
we  look  at  that  formula  and  address  the  different  parts  in  it,  we 
know  that  accuracy  is  in  the  first  place  a  function  of  the  physical 
model,  including  the  turbulence  model  (that  was  on  purpose  not 
addressed  here  at  this  Conference).  The  other  important  parameters 
are  the  number  of  grid  points,  the  distribution  of  the  grid  points, 
the  "order”  of  the  method  plus  the  artificial  dissipation  models  and 
whatever  flux  upwinding  or  multidimensional  upwinding  scheme  is  used. 
On  the  cost-side  we  have  prepocessing  as  I  already  mentioned,  plus 
the  CPU  time.  For  the  latter,  the  number  of  grid  points  is  again  a 
parameter,  plus  the  "numerical"  scheme,  the  solution  argorithm,  and 
of  course,  the  hardware.  The  latter  is,  however,  beyond  the  scope  of 
this  Conference. 

What  I  would  like  to  do  is  to  discuss  the  current  main  developments 
in  CFD  against  the  background  of  industry  requirements,  the 
efficiency  requirement  in  particular.  If,  as  a  baseline,  we  take  the 
currently  well-established  multi-block  structured  type  of  codes  with 
conventional  types  of  schemes,  let  us  say  the  Jameson  Flo-57,  -67 
level  of  technology,  we  can  try  to  estimate  where  we  can  improve, 
relative  to  that  baseline  situation.  For  the  unstructured  grid 
approach  we  may,  for  the  same  number  of  grid  cells,  have  perhaps 
somewhat  less  accuracy  than  for  a  structured  grid  solution.  I  am  not 
completely  sure  about  that,  and  you  may  wish  to  comment.  However, 
there  is  certainly  a  lot  of  gain  in  the  grid  generation  part.  If  we 
look  at  adaptive  grids,  it  is  clear  that  unstructured  as  well  as 
structured  adaptive  grids  have  a  potential  for  increasing  the 
accuracy  for  a  given  number  of  grids  cells.  However,  grid  adaptivity 
also  has  the  prospect  of  reducing  the  preprocessing  and  the  grid 
generation  calendar  time.  This  because  with  adaptation  the  first 
grid  you  start  with  doesn't  have  to  be  as  good  as  is  the  case  when 
you  do  not  have  an  adaptive  grid  approach.  The  CPU  aspect  for  given 
accuracy  is  also  clear.  I  think  the  biggest  advantages  for  adaptive 
grids  are  in  the  unstructured  case .  I  think  there  we  have  a  bigger 
potential  for  gain  in  accuracy  for  a  given  number  of  grid  cells  or 
reduction  of  the  number  of  grid  cells  for  given  accuracy  than  in  the 
case  of  structured  grids.  Higher  order  schemes,  multi-dimensional 
upwinding  and  similar  refinements,  are,  of  course,  good  for  accuracy. 
I  am  not  quite  sure  what  it  means  for  grid  generation.  I  have  the 
feeling  that  some  of  these  more  subtle  schemes  may  require  better 
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meshes  than  more  conventional  schemes.  They  also  may  require  some 
more  computer  time,  at  least  for  the  same  number  of  grid  points. 
However,  because  the  number  of  grid  points  required  for  a  certain 
accuracy  level  may  be  less,  we  may  still  gain  something.  I  don't 
know  what  the  balance  is.  You  may  wish  to  comment  on  that. 

One  further  remark,  on  adaptive  grids.  We  might  wonder  what  is  more 
efficient  -  to  implement  a  highly  sophisticated  higher  order  scheme 
with  the  best  thinkable  multi-dimensional  upwinding  with  only  one 
grid  point  in  the  shock  wave,  or  to  have  an  adaptive  grid  scheme 
with,  for  example,  two  or  three  grid  points  in  the  shock.  I  am  not 
sure  which  of  the  two  is  more  efficient.  I  will  stop  here.  This  is 
just  a  little  bit  of  provocation  in  order  to  get  you  out  of  your 
seats,  so  to  speak.  Who  would  like  to  shoot  at  this  or  anything 
else? 

P.E.  Rubbert.  Boeing  Commercial  Airplane  Group,  U.S. 

It  is  important  to  speak  to  how  good  do  you  have  to  be,  what  is  the 
target.  Not  just  faster,  but  how  fast,  etc.  One  of  the  things  I 
seem  to  detect  at  this  Conference  is  that  many  of  the  speakers  had  in 
their  mind  a  different  definition  of  the  decimal  point  than  I  do.  I 
heard  talk  about  working  hard  on  grid  generation  to  reduce  the  time 
from  three  weeks  to  maybe  one  week  or  maybe  one  day.  My  experience 
in  using  CFD  in  an  airplane  design  environment  is  that  when  you  are 
talking  about  designing  a  wing,  it  wasn't  too  many  years  ago  that 
that  involved  a  sequence  of  about  75  full  blown  CFD  runs,  part 
analysis,  part  inverse  design,  etc.  One  day  turnaround  was 
unacceptable.  We  do  not  want  to  take  75  days  to  design  wings.  The 
decimal  point  belongs  in  terms  of  hours,  not  days.  In  our  old  design 
environment,  our  target  was  to  get  three  turnarounds  in  an  8  hour  day 
in  the  design  environment.  The  challenge  is  now  to  reduce  cycle  time 
even  more.  So  I  think  it  is  worth  saying  that  some  of  the  targets 
that  I  hear  people  setting  for  themselves  will  produce  a  capability 
which  is  not  really  acceptable  and  useable  in  a  real  airplane  company 
environment. 

J.W.  Slooff.  NLR.  Netherlands 

Thank  you  for  that  comment,  and  it  reminded  me  that  I  forgot  to 
mention  one  aspect  in  relation  to  high  order  schemes  and  accuracy . 

We  are  not  looking  for  infinite  improvements  in  accuracy.  What  we 
need  is,  for  a  given  accuracy  that  we  want  to  obtain,  but  not 
necessarily  want  to  exceed,  the  highest  efficiency,  the  shortest 
preprocessing  turnaround  time  and  the  lowest  CPU  cost.  Higher  order 
methods  usually  have  their  greatest  benefit  if  you  require  very  high 
accuracy.  If  you  have  lower  accuracy  requirements  they  may  not  be  so 
well  suited  for  the  purpose.  In  industry,  you  probably  will  agree 
with  me,  different  levels  of  accuracy  are  needed  in  different  phases 
of  the  design  process.  Industry  is  not  always  looking  for  the 
highest  accuracy.  That  is  something  we  also  have  to  bear  in  mind  in 
considering  higher  order  methods. 

P.  Rubbert.  Boeing  Commercial  Airplane  Group,  U.S. 

The  subject  of  accuracy  -  another  thing  I  did  not  hear  at  the 
Conference  was  any  discussion  of  CFD  with  respect  to  the  environment 
in  which  we  use  it.  I  think  it  is  very  important  that  we  learn  how 
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to  think  about  what  we  want  from  CFD  in  the  presence  of  wind  tunnel 
analysis  and  the  other  tools  that  we  have  for  doing  airplane  design. 
For  example,  the  question  of  accuracy.  I  heard  many  times  people 
setting  goals  like  we  would  like  CFD  to  be  able  to  calculate  drag  at 
this  level  of  accuracy,  and  so  forth,  but  the  way  we  really  do  it  in 
industry  is  that  we  don't  depend  on  any  one  tool  to  give  us  the  total 
answer.  The  total  answer  is  arrived  at  by  utilizing  all  of  the 
information  at  your  disposal;  the  information  that  CFD  provides,  the 
information  that  wind  tunnels  provide,  your  previous  experience,  etc. 
Integrate  that  all  together  into  a  judgement  as  to  what  something 
like  the  drag  would  be.  Again,  when  we  talk  about  accuracy  of  CFD, 
if  it  is  going  to  take  us  6  months  to  build  a  wind  tunnel  model  and 
test  it,  that  means  one  thing  in  terms  of  the  amount  of  accuracy  you 
need  out  of  CFD.  But  I  heard  some  discussion  this  week  about 
stereolithography  methods,  and  things  like  that  that  could  lead  you 
in  the  direction  of  what  one  might  call  overnight  model 
manufacturing.  If  that  happens  in  the  wind  tunnel,  that  has  a  ma^or 
influence  on  the  type  of  accuracy  levels  you  would  need  out  of  CFD. 

If  you  could  rapidly  get  a  number  out  of  the  wind  tunnel,  maybe  you 
don't  need  to  focus  so  hard  on  CFD  accuracy.  I  guess  my  point  is  we 
have  to  stop  looking  at  CFD  by  itself.  We  have  to  learn  to  look  at 
it  with  respect  to  the  total  environment. 

D.  Knight.  Rutgers  University.  U.S. 

I  would  like  also  to  focus  on  this  question  of  accuracy.  As  I  have 
often  understood  it,  it  seems  to  be  more  of  a  question  of  accuracy  as 
a  function  of  resource  rather  than  resource  as  a  function  of 
accuracy.  Typically,  for  example,  if  you  want  to  compute  the  total 
pressure  recovery  in  an  inlet  in  an  industrial  environment,  the 
question  is  how  long  it  will  take  to  get  within  a  certain  accuracy. 
That  may  be  1%  for  the  total  pressure  recovery,  if  it  is  a  design,  it 
may  in  fact  be  even  smaller  or  perhaps  larger.  I  think  we  yet,  in  the 
CFD  community,  don't  focus  enough  on  the  question:  given  a  level  of 
accuracy  of  a  particular  type,  like  total  pressure  recovery ,  what  is 
the  resource  required  to  get  that.  If  you  are  in  industry  and  you 
have  a  week  to  do  a  computation,  can  you  actually  predict  the  total 
pressure  within  1%,  or  should  you  not  try  at  all.  Maybe  that  will 
take  2  weeks  and  that  is  the  information  that  you  need  to  know.  This 
also  raises  the  question  of  optimal  design:  the  optimal  design  of 
your  algorithm  in  terms  of  reconstruction  of  high  order  methods,  and 
also  the  optimal  design  of  your  grid  structure  within  that  algorithm. 
That,  of  course,  brings  to  the  fore  the  question  of  an  estimate  for 
the  accuracy  of  your  scheme.  That  is  an  issue  that  was  mentioned  in 
a  number  of  papers  including  the  earlier  one  this  morning  by 
Friedrichs.  In  the  CFD  community  we  still  do  not  yet  have  a  good 
measure  of  accuracy,  and  how  to  predict  that  from  our  solution. 

S.V.  Ramakrishnan .  Rockwell  Science  Center,  U.S^ 

I  have  one  comment  on  hybrid  grids.  From  our  experience  in 
generating  such  grids,  I  can  say  that  most  of  the  difficulty 
the  region  near  the  body  surface  for  complex  configurations.  If  you 
can  develop  a  structured  grid  near  the  body,  you  might  as  well 
develop  such  a  grid  everywhere,  because  it  doesn't  take  too  much  work 
to  develop  the  grid  away  from  the  body  surface.  Therefore,  if  we 
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cannot  solve  the  viscous  problem  with  unstructured  grid,  we  may  as 
well  not  use  it  all.  There  is  no  point  in  using  hybrid  grids. 


P.G.C.  Herring.  British  Aerospace  Ltd,  U.K. 

A  comment  first  on  adaptive  grids,  I  am  not  sure  if  the  size  of  the 
pluses  and  minuses  is  an  indication  of  the  potential  benefit,  but 
some  of  the  work  that  we  have  been  doing  is  beginning  to  indicate 
that  in  some  of  cases  it  is  maybe  not  worth  using  adaptive  grids. 

The  time  it  takes  you  to  develop  the  procedure  and  run  the  codes  is 
often  longer  than  it  would  take  just  running  2  or  3  cases  of  an 
ordinary  grid.  The  other  thing  that  surprises  me  is  your  last  line 
on  parallel  algorithms  which  indicates  that,  for  CFD  fluid  problems, 
there  is  not  much  benefit  to  be  gained  by  going  to  parallelization. 
You  have  a  small  positive  in  the  last  column.  As  I  say,  if  the  size 
of  the  plus  is  an  indication  of  the  benefit,  it  appears  we  are  not 
yet  ready  for  parallelization  with  CFD. 

J.W.  Slooff.  NLR.  Netherlands 

I  hasten  to  say  that  I  have  not  been  very  consistent  with  the  sizing 
of  the  pluses.  But  I  do  think  personally  that  a  good  grid  adaptivity 
scheme  is  one  of  the  most  important  things  that  we  have  to  go  after . 

J.A.  Essers.  University  of  Liege,  Belgium 

I  have  two  very  specific  questions.  The  first  one  is  about 
chemically  reacting  flows.  It  is  a  question  for  Dr.  Radespiel  or  Dr. 
Marmignon.  Perhaps  someone  can  answer  it.  Well,  in  the  abstracts  we 
received  no  proposals  on  the  following  subject.  In  the  past,  we 
expected  that  there  would  be  some  developments  on  techniques  using 
different  grids  for  different  equations,  for  example,  relatively 
coarse  grids  for  the  flow  equations  and  the  finer  grids  for  the 
chemical  reaction  equations.  I  heard  nothing  on  that  issue  in  this 
Symposium.  Is  that  idea  still  around  or  is  it  forgotten  now? 

The  second  question  is  concerning  the  DNS  method.  In  the  past,  I 
expected  that  DNS  would  provide  some  kind  of  numerical  wind  tunnel  or 
experiment  to  construct  better  turbulence  models,  classical 
turbulence  modelling,  or  for  example,  for  LES.  Nobody  addressesd 
that  subject.  My  question  is,  at  this  time,  are  there  some  people 
who  use  DNS  to  try  to  construct  better  models  for  turbulence  or  not? 
Usually,  when  I  attend  a  talk  on  DNS  I  hear  nothing  about  that. 

C .  Marmignon .  ONERA ,  France 

I  would  like  you  to  be  more  specific  with  the  question  if  possible. 
J.A.  Essers.  University  of  Liege,  Belgium 

In  the  past  some  people  suggested,  I  namely  think  of  Marsha  Burger  of 
the  Courant  Institute,  but  I  am  not  sure,  that  some  people  in  the 
U.S.  were  working  in  the  field  mentioned  in  my  first  question,  i.e., 
the  use  of  different  grids  when  you  have  chemical  reactions.  Some  of 
them  have  very  short  relaxation  times,  leading  to  very  sharp 
gradients  in  a  shock  wave  for  example,  so  it  could  be  worthwhile  to 
discretize  the  kinetic  equation  corresponding  to  that  reaction  with  a 
very  fine  grid  and  perhaps  to  use  a  coarser  grid  for  another  chemical 
reaction  and  still  a  coarser  grid  for  the  flow  equations.  So  you 
could  imagine  to  have  a  series  of  grids,  three  grids,  for  example. 
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Of  course,  the  points  of  the  coarse  grid  would  also  be  points  of  the 
finest  grid,  like  in  multigrid  techniques  I  would  say,  and  for  some 
equations  you  would  only  discretize  them  on  the  coarse  grids  and 
interpolate  the  results  for  the  fine  grids  in  order  to  save  time.  Is 
there  some  research  going  on  in  that  field?  Maybe  it  is  not 
interesting,  I  don't  know. 

C.  Marmiqnon.  ONERA.  France 
We  have  not  looked  at  this  point. 

J.W.  Slooff.  NLR.  Netherlands 

On  the  last  question,  that  is  the  DNS,  LES  question,  I  think  I  saw 
three  hands  up  there . 

B.  Geurts .  University  of  Twente ,  Netherlands 

The  question  you  raised  is,  as  exclusively  mentioned,  not  within  the 
scope  of  this  Symposium.  If  you  are  interested  in  it,  I  would  like 
to  refer  you  to  some  of  the  work  at  Twente  where  we  try  to  use  DNS  as 
a  data  base  for  developing  subgrid  models  for  LES  which  is  an 
intermediate  step  for  possible  extension  to  Reynolds  averaged 
turbulence  modelling  improvement.  We  are  not  unique  in  the  world, 
there  are  several  groups  that  have  similar  approaches  in  which  they 
start  from  DNS. 

P.  Comte.  LEGI.  Institut  de  Mecanique  de  Grenoble.  France. 

I  think  all  the  LES  community  has  tested  models  in  comparison  with 
DNS,  however  DNS  are  currently  restricted  to  fairly  low  Reynolds 
number  flows.  If  we  want  to  use  LES  for  higher  Reynolds  numbers, 
maybe  those  comparisons  wouldn't  be  that  relevant. 

N.  Kroll.  DLR.  Germany 

I  just  want  to  make  a  short  comment  on  chemically  reacting  flows.  In 
my  opinion  the  most  severe  problem  is  the  stiffness  of  the  discrete 
system.  I  think  you  cannot  overcome  this  problem  by  using  different 
mesh  types.  You  have  to  develop  efficient  algorithms  to  overcome 
that  stiffness  problem. 

J.  Jimenez.  Escuela  Superior  de  Inqenieros  Aeronauticos ,  Spain 
The  question  of  the  relationship  between  DNS  and  modelling  is 
something  that  has  been  considered  for  several  years.  It  is  a 
question  of  what  to  expect.  You  cannot  expect  DNS  to  give  you  a 
model.  That  has  to  be  done  by  modellers.  What  DNS  gives  you  is 
"ground  truth'.  It  gives  you  what  the  real  flow  is  doing,  and  it 
gives  you  constraints  on  which  models  work  and  which  ones  do  not. 

This  has  been  practiced  extensively  now,  at  the  CTR  in  Stanford,  at 
Twente,  as  reported  in  this  meeting,  and  at  many  other  places.  There 
are  cases  in  which  DNS  is  almost  the  only  data  available,  as  in  the 
case  of  stress  balances  in  separated  flows,  which  are  difficult  to 
measure  and  difficult  to  model,  but  which  have  been  computed  with 
DNS.  You  can  use  those  data  to  check  whether  a  particular  model 

works  or  not  and,  if  it  does  not,  it  is  up  to  the  modeller  to  come  up 

with  a  better  one.  That  last  step  was,  of  course,  outside  the  scope 
of  the  present  meeting.  DNS  can  do  this,  and  it  can  give  you  some 

ideas  of  how  to  improve  your  model,  but  it  cannot  produce  a  model  by 

itself.  It  is  as  difficult  to  get  good  models  out  of  DNS  as  it  has 


been  to  get  them  from  experimental  data.  There  is  nothing  magic 
about  DNS.  It  is  just  a  better  experiment. 
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J.A.  Essers,  University  of  Liege,  Belgium 

I  don^t  know,  but  I  suppose  it  is  easier  to  get  a  lot  of  data  from  a 
calculation  like  DNS  than  from  experiments.  I  don't  know  a  lot  in 
that  field,  but  I  would  see  DNS  as  a  kind  of  experimental  facility 
that  can  provide  you  with  a  lot  of  information  if  you  need  it. 

J.  van  Inaen,  Delft  University,  Netherlands 

I  think  we  should  not  forget  that  there  has  been  a  time,  I  refer  to 
the  Stanford  trials  in  '68  and  '80,  when  the  idea  was  that  we  just 
had  to  wait  for  the  ultimate  turbulence  model  and  all  our  problems 
would  have  been  solved.  Then,  I  think  around  the  '80's,  people 
started  to  realize  that  there  is  not  a  single  turbulence  model.  You 
will  have  models  for  different  kinds  of  flows.  So  if  you  say  DNS  is 
providing  a  different  approach  to  experiments,  yes,  but  you  will  need 
experiments,  hence  also  these  numerical  experiments,  in  these 
different  kinds  of  flow.  Having  a  problem  with  calculating  high 
Reynolds  numbers  will  remain  as  long  as  you  cannot  do  DNS  for  these 
high  Reynolds  numbers . 

J.W.  Slooff.  NLR.  Netherlands 

I  thank  you  all  for  your  contribution  to  this  discussion,  and  in 
particular  Dr.  Kroll  for  giving  us  the  starting  point  for  the  Round 
Table  Discussion.  I  think  Prof.  Essers  would  now  like  to  formally 
close  this  Conference. 

J.A.  Essers.  University  of  Liege,  Belcfium 

I  will  make  it  very  short.  First  of  all,  I  would  like  to  say  that  in 
my  opinion,  the  Conference  was  satisfactory  from  several  viewpoints. 
First  of  all  in  terms  of  attendance,  if  the  attendance  is  a  measure 
of  the  usefulness  to  the  NATO  people.  I  just  would  like  to  let  you 
know  that  we  had  124  attendees,  including  observers.  Panel  Members 
and  authors.  For  those  of  you  who  are  interested  in  this,  I  just 
show  you  a  distribution  of  attendance  per  country  so  it  could  be 
perhaps  useful  to  some  of  you.  That  is  just  for  statistics,  let  us 
say.  Now  concerning  the  technical  content,  let  me  just  say  that  I 
agreed  with  many  of  the  things  Dr.  Kroll  said.  Anyway,  I  think  that 

during  this  week  we  could  at  least  answer  some  questions.  For 

example,  we  know  which  was  perhaps  not  obvious  for  all  of  us,  that 

there  is  still  a  lot  of  exciting  future  for  CFD.  That  is  great 

because  otherwise  many  of  us  would  become  unemployed  in  the  near 
future.  I  feel  also  that  we  are  all  convinced  that  there  is  no  way 
that  CFD  could  replace  wind  tunnels  in  the  future.  They  have  to  work 
together  and  they  are  very  complementary  to  each  other.  Their 
complementary  role  should  still  be  reassessed  and  used  more 
intensively  in  the  future.  Then  I  have  some  conclusions  concerning 
some  work  that  could  be  done  in  the  future.  For  example,  I  believe 
that  that  issue  on  grids  will  be  very  important  and  I  think  that 
there  is  no  way  to  say  that  structured  grids  or  unstructured  grids 
will  be  better.  They  have  to  be  used  together.  For  example,  I  would 
like  to  remind  you  of  that  idea  of  hybrid  grids  and  overlapping 
grids,  and  all  these  things.  I  have  already  been  interested  in  the 
fact  that  you  can  have  good  error  detectors  and  good  error 
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estimators.  It  would  be  nice  if  we  could  generalize  them  to  the 
transient  error  and  to  evaluate  the  error  due  to  the  time 
discretization.  That  would  help  a  lot  in  unsteady  flow  calculations. 
I  also  appreciated  the  talks  on  DNS.  For  example,  it  was  very  clear 
that  DNS  could  only  be  used  in  some  small  parts  of  the  configuration, 
for  example,  and  then  the  issue  would  be  to  develop  multi-block 
techniques  using  DNS  in  some  blocks,  and  some  other  models  in  other 
blocks.  The  communication  between  blocks  obviously  still  has  to  be 
defined.  This  is  an  important  issue  for  the  future.  Finally,  I 
believe  that  to  accelerate  the  calculations,  which  I  believe  is 
something  very  important,  we  should  both  use  more  efficient  numerical 
techniques  like  implicit  techniques  and  so  on.  Also,  to  have 
efficient  computers.  I  liked  a  lot  the  talks  on  parallel  computing, 
namely  the  talk  by  Prof.  Knight.  I  must  confess  that  many  of  us  who 
don't  use  parallel  computing  are  a  little  bit  scared  of  using  it. 

But  I  think  we  should  do  it  anyway,  or  we  yill  be  out  of  business.  I 
would  however  feel  concerned  by  the  portability  issue.  If  you  say  it 
could  become  very  portable,  it  would  be  nice  to  use  it. 

To  close  this  meeting,  I  would  like  to  thank  all  of  the  people  who 
contributed  to  the  success  of  this  Conference.  I  will  not  thank  each 
of  them  separately  because  this  will  be  done  by  Christian  Dujarric  in 
a  few  minutes.  I  just  would  like  to  say  that  I  am  grateful  to  the 
authors  who  prepared  good  papers;  in  particular,  I  feel  very 
satisfied  by  the  fact  that  we  received  a  copy  of  all  of  the  papers 
now,  which  is  not  so  usual  in  AGARD  Conferences,  so  you  can  go  back 
home  with  copies  of  all  the  papers.  That  is  good  in  itself.  I  would 
also  like  to  thank  the  Programme  Committee  members,  the  session 
chairmen  and  the  technical  ealuator.  Dr.  Kroll,  who  did  a  great  job, 
in  my  opinion  and  also  the  Spanish  organizers.  They  had  planned 
everything  including  very  good  weather  and  they  had  a  great  party  on 
Monday.  There  were  very  nice  facilities.  I  would  like  to  thank  you 
for  your  attendance  to  this  Symposium.  I  hope  that  you  will  go  back 
home  and  remember  this  Conference  as  useful  for  your  work.  I  hope 
that  it  will  be  very  rewarding  for  your  career,  and  wish  you  a  good 
trip  back  home. 

C.  Duiarric.  Chairman  Fluid  Dynamics  Panel 

Thank  you  Prof.  Essers.  Ladies  and  Gentlemen.  We  have  now  come  to 
the  end  of  our  Symposium.  I  think  that  we  have  identified  together 
promising  research  orientations.  The  scientific  material  will  permit 
each  of  us  to  formulate  recommendations  for  our  respective 
organizations  on  the  aspects  of  the  use  of  numerical  methods  for 
aerodynamics  which  particularly  merits  our  efforts  for  its 
development.  The  Fluid  Dynamics  Panel  will  use  the  results  of  this 
Conference  as  one  of  the  elements  for  its  contribution  to  Working 
Group  Aerospace  2020.  This  Working  Group  will  present  to  the  highest 
authorities  of  NATO  the  recommendations  concerning  the  technological 
efforts  required  to  provide  to  the  Alliance  by  2020,  radically 
improved  military  capacity  in  spite  of  the  tight  expected  budgetary 
pressures . 

A  symposium  regrouping  all  the  panels  of  AGARD  is  planned  in  Paris  in 
the  Spring  of  1997  to  present  the  conclusions  of  this  Working  Group. 
This  meeting  will  have  in  attendance  military  authorities,  industry 
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representatives  and  researchers.  This  will  be  for  us  an  occasion  to 
deliver  our  message,  and  I  hope  that  many  of  you  will  participate. 

This  Symposium  which  has  just  finished,  has  been  very  well  followed 
as  Prof.  Essers  has  mentioned.  Inspite  of  some  of  the  points  made  by 
Dr.  Kroll,  it  was  very  fruitful.  The  Program  Committee  deserves  our 
congratulations.  We  thank  Prof.  Essers,  the  Chairman  of  the  Program 
Committee.  We  also  thank  the  members  of  the  Committee,  Prof. 
Deconinck,  Prof.  Kind,  Prof.  Bonnet,  M.  Jacquotte,  M.  Lacau,  Dr. 
Korner,  Prof.  Panaras,  M.  Borsi,  Prof.  Slooff,  Dr.  Ytrehus,  Prof. 
Falcao,  Dr.  Corral,  Prof.  Jimenez,  Prof.  Kaynak,  Dr.  Poll,  Prof. 
Cantwell  and  Dr.  Lekoudis.  We  warmly  thank  all  the  authors  and  all 
of  you  who  have  helped  us  to  have  a  lively  discussion.  We  also  thank 
the  Technical  Evaluator,  Dr.  Kroll,  who  has  presented  his  point  of 
view  regarding  our  work.  These  comments  will  be  attached  to  the 
publication  of  the  Round  Table  Discussion.  A  remarkable  job  of 
organization  was  done  to  permit  us  to  have  our  Symposium.  I  would 
like  to  thank  on  behalf  of  the  Fluid  Dynamics  Panel,  the  Spanish 
authorities,  in  particular  the  National  Delegates,  for  the  invitation 
to  hold  this  meeting  in  Seville.  I  remind  you  that  the  Minister  of 
Defense  for  Spain  and  INTA  have  contributed  to  making  our  stay  so 
agreeable  by  financing  the  organization  of  our  Conference.  We  are 
very  grateful.  We  thank  in  particular,  Lt.  General  Mira  Perez  for 
the  wonderful  evening  we  had  last  Monday. 

We  especially  thank  our  Local  Coordinator,  Prof.  Javier  Jimenez  as 
well  as  Miss  C.  Gonzalez  Hernandez,  Spanish  National  Coordinator. 

This  Conference  would  not  have  been  possible  without  the  complicated 
logistics  whose  operation  relies  largely  on  good  will.  So  we  thank 
the  interpreters  who  have  succeeded  in  translating  in  spite  of  the 
very  technical  character  of  our  remarks,  considering  especially  the 
level  of  difficulty  of  doing  so  with  certain  speakers,  perhaps  myself 
included . 

We  thank  the  technicians  for  keeping  the  equipment  functioning,  the 
hostesses,  as  well  as  the  people  who  welcomed  us  and  helped  in  the 
smooth  running  of  the  Conference. 

Lastly,  we  thank  the  Secretary  of  our  Panel,  Anne-Marie  Rivault,  who 
has  just  received  the  AGARD  Personnel  Medal  for  her  devotion  to  the 
FDP  and  who  participates  for  the  last  time  in  a  Symposium  before 
taking  her  well-deserved  retirement. 

We  also  thank  the  Panel's  Executive,  Mr.  Jack  Molloy  for  his  very 
effective  support  in  the  preparation  of  this  Conference. 

Now  I  would  like  to  present  you  with  our  program  for  1996.  We  will 
have  in  the  Spring  a  Symposium  on  the  Characterization  and 
Modifilation  of  Wakes  from  Lifting  Vehicles.  This  will  take  place  in 
Trondheim  in  Norway  from  the  20  to  the  23th  of  May,  1996.  In  the 
Fall,  if  everything  goes  well,  we  will  organize  for  the  first  time  in 
the  history  of  AGARD,  a  Symposium  in  Moscow.  It  demonstrates  the 
recent  opening  up  toward  the  countries  of  the  old  Soviet  bloc.  It 
will  cover  the  Aerodynamics  of  Wind  Tunnel  Circuits  and  Their 
Components.  The  Russians  have  a  great  deal  of  experience  in  this 
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field  and  have  promised  to  share  this  expertise.  We  begin  a  new 
level  of  cooperation  which  will  be  technically  extremely  fruitful  for 
us.  We  will  also  have  in  1996,  two  special  courses  at  VKI ,  one  on 
Advances  in  Cryogenic  Wind  Tunnel  Technology,  and  the  other  on 
Aerothermodynamics  and  Propulsion  Integration  for  Hypersonic 
Vehicles.  You  are  all  invited  to  participate  in  our  future  progams, 
and  I  hope  to  have  the  pleasure  to  meet  you.  Thank  you  for  your 
attention. 
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